Advanced computer architecture
-
Upload
md-mahedi-mahfuj -
Category
Documents
-
view
4.696 -
download
3
Transcript of Advanced computer architecture
![Page 1: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/1.jpg)
CSE 8383 - Advanced Computer Architecture
Week-3Week of Jan 26, 2004
engr.smu.edu/~rewini/8383
![Page 2: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/2.jpg)
Contents Linear Pipelines Nonlinear pipelines Instruction Pipelines Arithmetic Operations Design of Multifunction Pipeline
![Page 3: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/3.jpg)
Linear Pipeline Processing Stages are linearly
connected Perform fixed function Synchronous Pipeline
Clocked latches between Stage i and Stage i+1
Equal delays in all stages Asynchronous Pipeline
(Handshaking)
![Page 4: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/4.jpg)
Latches
S1 S2 S3
L1 L2
Equal delays clock period
Slowest stage determines delay
![Page 5: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/5.jpg)
Reservation Table
X
X
X
X
S1
S2
S3
S4
Time
![Page 6: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/6.jpg)
5 tasks on 4 stages
XX XX XX XX XX
XX XX XX XX XX
XX XX XX XX XX
XX XX XX XX XX
S1
S2
S3
S4
Time
![Page 7: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/7.jpg)
Non Linear Pipelines Variable functions Feed-Forward Feedback
![Page 8: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/8.jpg)
3 stages & 2 functions
S1 S2 S3
YX
![Page 9: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/9.jpg)
Reservation Tables for X & Y
X X X
X X
X X X
Y Y
Y
Y Y Y
S1
S2
S3
S1
S2
S3
![Page 10: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/10.jpg)
Linear Instruction Pipelines Assume the following instruction
execution phases: Fetch (F) Decode (D) Operand Fetch (O) Execute (E) Write results (W)
![Page 11: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/11.jpg)
Pipeline Instruction Execution
II11 II22 II33
II11 II22 II33
II11 II22 II33
II11 II22 II33
II11 II22 II33
F
D
E
W
O
![Page 12: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/12.jpg)
Dependencies Data Dependency
(Operand is not ready yet)
Instruction Dependency(Branching)
Will that Cause a Problem?
![Page 13: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/13.jpg)
Data Dependency
I1 -- Add R1, R2, R3
I2 -- Sub R4, R1, R5
II11 II22
II11 II22
II11 II22
II11 II22
II11 II22
F
D
E
W
O
1 2 3 4 5 6
![Page 14: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/14.jpg)
Solutions STALL Forwarding Write and Read in one cycle ….
![Page 15: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/15.jpg)
Instruction Dependency
I1 – Branch o
I2 –
II11 II22
II11 II22
II11 II22
II11 II22
II11 II22
F
D
E
W
O
1 2 3 4 5 6
![Page 16: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/16.jpg)
Solutions STALL Predict Branch taken Predict Branch not taken ….
![Page 17: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/17.jpg)
Floating Point Multiplication Inputs (Mantissa1, Exponenet1),
(Mantissa2, Exponent2) Add the two exponents Exponent-out Multiple the 2 mantissas Normalize mantissa and adjust exponent Round the product mantissa to a single
length mantissa. You may adjust the exponent
![Page 18: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/18.jpg)
Linear Pipeline for floating-point multiplication
Add Exponents
Multiply Mantissa
Normalize Round
Partial Products
AccumulatorAdd Exponents
Normalize Round
Renormalize
![Page 19: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/19.jpg)
Linear Pipeline for floating-point Addition
Partial Shift
AddMantissa
Subtract Exponents
Find Leading 1
RoundRe
normalize
Partial Shift
![Page 20: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/20.jpg)
Combined Adder and Multiplier
Partial Shift
AddMantissa
ExponentsSubtract
/ ADD
Find Leading 1
RoundRe
normalize
Partial Shift
Partial Products
CA
B
E D
F G H
![Page 21: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/21.jpg)
Reservation Table for Multiply
1 2 3 4 5 6 7
A XB X XC X XD X XE XF
G
H
![Page 22: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/22.jpg)
Reservation Table for Addition
1 2 3 4 5 6 7 8 9
A Y
B
C Y
D Y
E Y
F Y Y
G Y
H Y Y
![Page 23: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/23.jpg)
Nonlinear Pipeline Design Latency
The number of clock cycles between two initiations of a pipeline
CollisionResource Conflict
Forbidden LatenciesLatencies that cause collisions
![Page 24: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/24.jpg)
Nonlinear Pipeline Design cont Latency Sequence
A sequence of permissible latencies between successive task initiations
Latency CycleA sequence that repeats the same subsequence
Collision vectorC = (Cm, Cm-1, …, C2, C1), m <= n-1
n = number of column in reservation tableCi = 1 if latency i causes collision, 0 otherwise
![Page 25: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/25.jpg)
Mul – Mul Collision (lunch after 1 cycle)
1 2 3 4 5 6 7
A X ZB X X Z ZC X X Z ZD X Z XE X ZF
G
H
![Page 26: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/26.jpg)
Mul –Mul Collision (lunch after 2 cycles)
1 2 3 4 5 6 7
A X ZB X X Z ZC X X Z ZD X X ZE XF
G
H
![Page 27: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/27.jpg)
Mul – Mul Collision (lunch after 3 cycles)
1 2 3 4 5 6 7
A X ZB X X Z ZC X X Z ZD X XE XF
G
H
![Page 28: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/28.jpg)
Collision Vector for Multiply after Multiply
Forbidden Latencies: 1, 2
Collision vector0 0 0 0 1 1 11
Maximum forbidden latency = 2 m = 2
![Page 29: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/29.jpg)
Example
S1 S2 S3
YX
![Page 30: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/30.jpg)
Reservation Tables for X & Y
X X X
X X
X X X
Y Y
Y
Y Y Y
S1
S2
S3
S1
S2
S3
![Page 31: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/31.jpg)
Reservation Tables for X & Y
X X X
X X
X X X
Y Y
Y
Y Y Y
S1
S2
S3
S1
S2
S3
![Page 32: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/32.jpg)
Forbidden Latencies X after X X after Y Y after X Y after Y
![Page 33: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/33.jpg)
X after X
X1 X2 X1 X2 X1
X1 X2 X1 X2
X1 X2 X1
X2 X1
S1
S2
S3
X1 X2 X1 X1
X1 X1 X2
X1 X1 X1 X2
S1
S2
S3
5
2
![Page 34: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/34.jpg)
X after X
X1 X2 X1 X1
X1 X1 X2 X2
X1 X1 X2 X1
S1
S2
S3
X1 X1 X2 X1
X1 X1
X1 X1 X1
S1
S2
S3
4
7
![Page 35: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/35.jpg)
Collision Vector Forbidden Latencies: 2, 4, 5, 7 Collision Vector = 1 0 1 1 0 1 0
![Page 36: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/36.jpg)
Y after Y
Y Y Y
Y Y
Y Y Y
Y Y
S1
S2
S3
Y Y Y
Y
Y Y Y Y
S1
S2
S3
![Page 37: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/37.jpg)
Collision Vector Forbidden Latencies: 2, 4 Collision Vector = 1 0 1 0
![Page 38: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/38.jpg)
Exercise – Find the collision vector
1 2 3 4 5 6 7
A X X X
B X X
C X X
D X
![Page 39: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/39.jpg)
State Diagram for X
1 0 1 1 0 1 0
1 1 1 1 1 1 11 0 1 1 0 1 1
36 8+
6
8+
8+
3*
1*
![Page 40: Advanced computer architecture](https://reader031.fdocuments.us/reader031/viewer/2022020115/55504361b4c90580748b4c00/html5/thumbnails/40.jpg)
Cycles Simple cycles each state
appears only once(3), (6), (8), (1, 8), (3, 8), and (6,8) Greedy Cycles simple cycles
whose edges are all made with minimum latencies from their respective starting states
(1,8), (3) one of them is MAL