Post on 27-Jul-2018
1
Fundamental Algorithms for System Modeling, Analysis, and OptimizationAnalysis, and Optimization
Edward A. Lee, Jaijeet Roychowdhury, Sanjit A. Seshia, Stavros TripakisUC BerkeleyEECS 144/244Spring 2013Spring 2013
Copyright © 2010-13, E. A. Lee, J. Roychowdhury, S. A. Seshia, S. Tripakis. All rights reserved
Lecture: Timing Analysis, Retiming
Thanks to Kurt Keutzer for several slides
RTL Synthesis Flow
RTLS
HDLHDL
Simulation/ Verification
FSM,Verilog,VHDL
Synthesis
netlist
logicoptimization
netlist
Library/modulegenerators
a
b
s
q0
1
d
clk
a
bq
0
1
d
Boolean circuit/network
Boolean circuit/network
EECS 144/244, UC Berkeley: 2
physicaldesign
layout
K. Keutzer
s clk
Graph / RectanglesTiming Analysis
2
Timing Analysis / Verification
Verifying a property about system timing
Arises in many settings: Integrated circuits Integrated circuits
Embedded software
Distributed embedded systems
Biological systems
…
EECS 144/244, UC Berkeley: 3
Here we focus on circuits.
Chapter 15 of Lee-Seshia book contains a more broad discussion.
Timing Analysis for Digital ICs
(Clock) Speed is one of the major performance metrics for digital circuits
Timing Analysis = the process of verifying that a chip meets its speed requirementE.g., 1 GHz means that next-state function must be
computed within 1 ns
EECS 144/244, UC Berkeley: 4
3
Static Timing Analysis for Circuits
Determine fastest permissible clock speed (e.g. 1 GHz)
clk
Combinationallogic
clk
Combinationallogic
clk
Combinationallogic
EECS 144/244, UC Berkeley: 5
p p ( g )by determining delay of longest path from register to register (e.g. 1ns.)
Cycle Time - Critical Path Delay
Cycle time (T) cannot be smaller than longest path delay (Tmax) data
cycle time
p y ( max)
Longest (critical) path delay is a function of:
Total gate, wire delays
Tclock1
data
maxT T
EECS 144/244, UC Berkeley: 6
clock
Q1 Q2
Tclock1 Tclock2critical path,
~5 logic levels
4
Cycle Time - Setup Time
For FFs to correctly latch data, input data must be stable during: data
setup time
be stable during:
Setup time (Tsetup) beforeclock arrives
Tclock1
data
max setupT T T
EECS 144/244, UC Berkeley: 7
clock
Q1 Q2
Tclock1 Tclock2critical path,
~5 logic levels
Cycle Time - Clock-skew
Tclock2
T
dataIf clock network has unbalanced delay – clock skew Tclock1
Q2clock skew
Q2
skew
Cycle time is also a function of clock skew (Tskew)
max setup skewT T T T
EECS 144/244, UC Berkeley: 8
clock
Q1 Q2
Tclock1 Tclock2
critical path, ~5 logic levels
5
Cycle Time - Clock to Q
Cycle time is also a function of propagation delay of
Tclock1
T l k2
data
p p g yFF (Tclk-to-Q)
Tclk-to-Q : time from arrival of clock signal till change at FF output)
Q
Tclock2
Q2clock-to-Q
Q
max setup skew clk to QT T T T T
EECS 144/244, UC Berkeley: 9
clock
Q1 Q2
Tclock1 Tclock2
Q2
critical path, ~5 logic levels
Min Path Delay - Hold Time
For FFs to correctly latch data, input data must be stable during Hold time T
data
hold time
stable during Hold time (Thold) after clock arrives
Determined by delay of shortest path in circuit (Tmin) and clock skew (Tskew)
Tclock1
min hold skewT T T
EECS 144/244, UC Berkeley: 10
clock
Q1 Q2
Tclock1 Tclock2short path, ~3
logic levels
6
Summary
Need to be able to compute
Tmin: delay of “fastest” path in the circuit
Tmax: delay of “slowest” (critical) path in the circuit
EECS 144/244, UC Berkeley: 11
Modeling Timing in a Combinational Circuit
Arrival time in green A 2
1
0.20
10
X
Z
W
.05
C
B
f
2
2
1
1
0
.20
.20
.10
YZ
.15
.05
1
Interconnect delay in red
Gate delay in blue
What’s the right mathematical
EECS 144/244, UC Berkeley: 13
What s the right mathematical object to use to represent this physical object?
7
Modeling - 1
Use a labeled directed graph
G = <V E>
A 21
0.20
10
X
Z
W
.05
X0.05A
2
G = <V,E>
Vertices represent gates, primary inputs and primary outputs
Edges represent wires
C
B
f
2
2
1
1
0
.20
.20
.10
YZ
.15
.05
1
EECS 144/244, UC Berkeley: 14
C
B
f
Y
W .05.1
1
.2
.15.20
.20
12
2
2
Z
Labels represent delays
What about arrival times?
0
0
1
s
Modeling - 2
C
X0
0
A
15.201
2
2
Find longest/shortest paths in a directedgraph G = <V,E>
C
B
f
Y
W .05.1
1
.2
0
1
.15
.202
2
Z
g ap G ,
What sort of directed graph do we have?
Is this in the standard form for a
s
EECS 144/244, UC Berkeley: 15
longest/shortest path problem?
8
Split Nodes into Edges
C
X0
05.10
A
.15.201
2
2C
B
f
Y
W .05.1
1
.2
0
1.20
2
Z
22
s
EECS 144/244, UC Berkeley: 16
2 1
0.5 2.5
3
DAG with Weighted Edges
X0
A
Cf
Y
W1.050.1
1
0.2
0
0
1
2.15
2.2
2.2Zs
EECS 144/244, UC Berkeley: 17
B1
Problem: Find the longest (critical) path from source s to sink f.
9
Naïve Approach: Enumerate Paths
A
How many paths in this example?In the worst case?
Cf
X
W
0
1.05.1
.2
0
0
A
2.15
2.2
2 2Zs
EECS 144/244, UC Berkeley: 18
B
Y11
2.2
Problem: Find the longest path from source s to sink f.
Algorithm 1: Longest path in a DAG
Critical Path Method [Kirkpatrick 1966, IBM JRD](dynamic programming)
Let w(u,v) denote weight of edge from u to v
Steps:
1. Topologically sort vertices (graph is acyclic)
order: v1, v2, …, vn v1 = s, vn = ?
2. For each vertex v, compute
EECS 144/244, UC Berkeley: 19
d(v) = length of longest path from source s to v
d(v1) = 0
For i = 2..n
d(vi) = maxall incoming edges (u, vi)d(u) + w(u,vi)
10
Algorithm 1: Longest path in a DAG
Critical Path Method [Kirkpatrick 1966, IBM JRD]
Let w(u,v) denote weight of edge from u to v
Steps:
1. Topologically sort vertices
order: v1, v2, …, vn v1 = s, vn = f
2. For each vertex v, compute
d( ) l th f l t th f t
Time Complexity?O(m+n)
EECS 144/244, UC Berkeley: 20
d(v) = length of longest path from source s to v
d(v1) = 0
For i = 2..n
d(vi) = maxall incoming edges (u, vi)d(u) + w(u,vi)
Run the CPM on our example
Graph vs. Circuit
Delay of a combinational circuit depends on Circuit topology (graph model)
Delay model (e.g., fixed vs. variable)
Boolean behavior of gates
We have only considered circuit topology so far.
EECS 144/244, UC Berkeley: 21
Longest/shortest path found on the graph can be very pessimistic: Paths can be FALSE
Delay values are BOUNDS
11
False Paths (consider Transition Mode)
A path is false if it cannot be responsible for the delay of a circuit
EECS 144/244, UC Berkeley: 22
Graph model implies path of length 6
False Paths
A path is false if it cannot be responsible for the delay of a circuit
EECS 144/244, UC Berkeley: 23
Graph model implies path of length 6
12
False vs. True Paths
TRUE path = one that can be responsible for the delay of a circuita circuit
Need techniques to find whether a path is TRUE or FALSE
EECS 144/244, UC Berkeley: 24
The Fixed Delay Model: Constant delay for each gate (or wire) Typo: this should be 1
EECS 144/244, UC Berkeley: 25
13
Paradoxical Behavior with the Transition Model?Typo: this should be 1
non-monotonic
EECS 144/244, UC Berkeley: 26
w.r.t. time delays!
Problems with Fixed Delay + Transition Model
1. Transition model can be tricky to reason about
Fi d d l li i d2. Fixed gate delays are unrealistic, due to manufacturing process variations
More realistic delay model: Lower and upper bounds
EECS 144/244, UC Berkeley: 27
Perform timing analysis for a whole family of circuits that share the same lower/upper bounds
14
Fixed Delays Bounded Delays
Want algorithms that report the critical path delay
of the slowest circuit in the circuit familyof the slowest circuit in the circuit family
EECS 144/244, UC Berkeley: 28
Paper: Computation of Floating Mode Delay in Combinational
Circuits: Theory and Algorithms, Devadas, Keutzer, Malik, 1993
(on bspace)
RETIMING
EECS 144/244, UC Berkeley: 29
15
Retiming Tradeoffs
aa
b 1
1
12
2
2
EECS 144/244, UC Berkeley: 30
[Shenoy, 1997]
6
4
clock period =
# registers =
Retiming Tradeoffs
aa
b 1
1
12
2
2
EECS 144/244, UC Berkeley: 31
5
4
clock period =
# registers =
[Shenoy, 1997]
16
Retiming Tradeoffs
aa
b 1
1
12
2
2
EECS 144/244, UC Berkeley: 32
clock period =
# registers =
4
3
[Shenoy, 1997]
Retiming Tradeoffs
aa
b 1
1
12
2
2
EECS 144/244, UC Berkeley: 33
clock period =
# registers =
2
4
[Shenoy, 1997]
17
Goals of Retiming
Possible goals:
• Minimize clock period (min-period retiming)
• Minimize number of registers (min-area retiming)Minimize number of registers (min area retiming)
• Minimize number of registers for a target clock period (constrained min-area retiming)
EECS 144/244, UC Berkeley: 34[Shenoy, 1997]
Abstraction:Circuit Graph
Nodes: circuit elementsNode weights: delay
EECS 144/244, UC Berkeley: 35
Dummy node for fanout
18
Abstraction:Registers?
EECS 144/244, UC Berkeley: 36
Abstraction - Registers
0
1
1
1
0
00 0
0 0 0
0
000
00
EECS 144/244, UC Berkeley: 37
Arc weights indicate registers
1
19
Abstraction - Environment
0
1
0
00 0
0 0 0
0
000
0
EECS 144/244, UC Berkeley: 38
11
1
0
Cutset – Divides the Graph in Two
0
1
0
00 0
0 0 0
0
000
0
EECS 144/244, UC Berkeley: 39
11
1
0
20
Retiming: Add a register on all arcs crossing the cutset in one direction, and subtract a register from all arcs crossing the cutset in the other direction.
0 0+1=1
1
0
00
00+1=1
0
0
000
0
0+1=1
EECS 144/244, UC Berkeley: 40
11-1=0
1-1=0
0
Recall: Retiming Tradeoffs
aa
b 1
1
12
2
2
EECS 144/244, UC Berkeley: 41
[Shenoy, 1997]
6
4
clock period =
# registers =
21
Recall: Retiming Tradeoffs
aa
b 1
1
12
2
2
EECS 144/244, UC Berkeley: 42
5
4
clock period =
# registers =
[Shenoy, 1997]
Simplest cutset surrounds one node
0
1
1-1=0
1-1=00
00
0+1=1 0
0
000
0
EECS 144/244, UC Berkeley: 43
10 0
0
22
Recall: Retiming Tradeoffs
aa
b 1
1
12
2
2
EECS 144/244, UC Berkeley: 44
5
4
clock period =
# registers =
[Shenoy, 1997]
Recall: Retiming Tradeoffs
aa
b 1
1
12
2
2
EECS 144/244, UC Berkeley: 45
clock period =
# registers =
4
3
[Shenoy, 1997]
23
For each node v, define r (v) = # of registers moved from the outputs to the inputs.
0
r (v) = -1
A retiming is now an assignment r (v) for every node v such that the weight of every arc is non-negative.
1
1-1=0
1-1=00
00
0+1=1 0
0
000
0v
y g
EECS 144/244, UC Berkeley: 46
10 0
0
Rest of the story
Retiming operations = graph transformations
Each graph has a vector of costs: (clock period, # registers, …)
Formulate and solve optimization problem.
Papers (on bspace):
EECS 144/244, UC Berkeley: 47
Papers (on bspace):1. Leiserson, C. E. and J. B. Saxe (1983). "Optimizing synchronous systems."
Journal of VLSI and Computer Systems: pp. 41-67.
2. Leiserson, C. E. and J. B. Saxe (1991). "Retiming synchronous circuitry." Algorithmica 6(1): pp. 5-35.
3. Shenoy, N. (1997). "Retiming: Theory and practice." Integration, the VLSI Journal 22: pp. 1-21.