Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs...

Post on 27-Jul-2018

217 views 0 download

Transcript of Fundamental Algorithms for System Modeling, Analysis, and ... · Timing Analysis for Digital ICs...

1

Fundamental Algorithms for System Modeling, Analysis, and OptimizationAnalysis, and Optimization

Edward A. Lee, Jaijeet Roychowdhury, Sanjit A. Seshia, Stavros TripakisUC BerkeleyEECS 144/244Spring 2013Spring 2013

Copyright © 2010-13, E. A. Lee, J. Roychowdhury, S. A. Seshia, S. Tripakis. All rights reserved

Lecture: Timing Analysis, Retiming

Thanks to Kurt Keutzer for several slides

RTL Synthesis Flow

RTLS

HDLHDL

Simulation/ Verification

FSM,Verilog,VHDL

Synthesis

netlist

logicoptimization

netlist

Library/modulegenerators

a

b

s

q0

1

d

clk

a

bq

0

1

d

Boolean circuit/network

Boolean circuit/network

EECS 144/244, UC Berkeley: 2

physicaldesign

layout

K. Keutzer

s clk

Graph / RectanglesTiming Analysis

2

Timing Analysis / Verification

Verifying a property about system timing

Arises in many settings: Integrated circuits Integrated circuits

Embedded software

Distributed embedded systems

Biological systems

EECS 144/244, UC Berkeley: 3

Here we focus on circuits.

Chapter 15 of Lee-Seshia book contains a more broad discussion.

Timing Analysis for Digital ICs

(Clock) Speed is one of the major performance metrics for digital circuits

Timing Analysis = the process of verifying that a chip meets its speed requirementE.g., 1 GHz means that next-state function must be

computed within 1 ns

EECS 144/244, UC Berkeley: 4

3

Static Timing Analysis for Circuits

Determine fastest permissible clock speed (e.g. 1 GHz)

clk

Combinationallogic

clk

Combinationallogic

clk

Combinationallogic

EECS 144/244, UC Berkeley: 5

p p ( g )by determining delay of longest path from register to register (e.g. 1ns.)

Cycle Time - Critical Path Delay

Cycle time (T) cannot be smaller than longest path delay (Tmax) data

cycle time

p y ( max)

Longest (critical) path delay is a function of:

Total gate, wire delays

Tclock1

data

maxT T

EECS 144/244, UC Berkeley: 6

clock

Q1 Q2

Tclock1 Tclock2critical path,

~5 logic levels

4

Cycle Time - Setup Time

For FFs to correctly latch data, input data must be stable during: data

setup time

be stable during:

Setup time (Tsetup) beforeclock arrives

Tclock1

data

max setupT T T

EECS 144/244, UC Berkeley: 7

clock

Q1 Q2

Tclock1 Tclock2critical path,

~5 logic levels

Cycle Time - Clock-skew

Tclock2

T

dataIf clock network has unbalanced delay – clock skew Tclock1

Q2clock skew

Q2

skew

Cycle time is also a function of clock skew (Tskew)

max setup skewT T T T

EECS 144/244, UC Berkeley: 8

clock

Q1 Q2

Tclock1 Tclock2

critical path, ~5 logic levels

5

Cycle Time - Clock to Q

Cycle time is also a function of propagation delay of

Tclock1

T l k2

data

p p g yFF (Tclk-to-Q)

Tclk-to-Q : time from arrival of clock signal till change at FF output)

Q

Tclock2

Q2clock-to-Q

Q

max setup skew clk to QT T T T T

EECS 144/244, UC Berkeley: 9

clock

Q1 Q2

Tclock1 Tclock2

Q2

critical path, ~5 logic levels

Min Path Delay - Hold Time

For FFs to correctly latch data, input data must be stable during Hold time T

data

hold time

stable during Hold time (Thold) after clock arrives

Determined by delay of shortest path in circuit (Tmin) and clock skew (Tskew)

Tclock1

min hold skewT T T

EECS 144/244, UC Berkeley: 10

clock

Q1 Q2

Tclock1 Tclock2short path, ~3

logic levels

6

Summary

Need to be able to compute

Tmin: delay of “fastest” path in the circuit

Tmax: delay of “slowest” (critical) path in the circuit

EECS 144/244, UC Berkeley: 11

Modeling Timing in a Combinational Circuit

Arrival time in green A 2

1

0.20

10

X

Z

W

.05

C

B

f

2

2

1

1

0

.20

.20

.10

YZ

.15

.05

1

Interconnect delay in red

Gate delay in blue

What’s the right mathematical

EECS 144/244, UC Berkeley: 13

What s the right mathematical object to use to represent this physical object?

7

Modeling - 1

Use a labeled directed graph

G = <V E>

A 21

0.20

10

X

Z

W

.05

X0.05A

2

G = <V,E>

Vertices represent gates, primary inputs and primary outputs

Edges represent wires

C

B

f

2

2

1

1

0

.20

.20

.10

YZ

.15

.05

1

EECS 144/244, UC Berkeley: 14

C

B

f

Y

W .05.1

1

.2

.15.20

.20

12

2

2

Z

Labels represent delays

What about arrival times?

0

0

1

s

Modeling - 2

C

X0

0

A

15.201

2

2

Find longest/shortest paths in a directedgraph G = <V,E>

C

B

f

Y

W .05.1

1

.2

0

1

.15

.202

2

Z

g ap G ,

What sort of directed graph do we have?

Is this in the standard form for a

s

EECS 144/244, UC Berkeley: 15

longest/shortest path problem?

8

Split Nodes into Edges

C

X0

05.10

A

.15.201

2

2C

B

f

Y

W .05.1

1

.2

0

1.20

2

Z

22

s

EECS 144/244, UC Berkeley: 16

2 1

0.5 2.5

3

DAG with Weighted Edges

X0

A

Cf

Y

W1.050.1

1

0.2

0

0

1

2.15

2.2

2.2Zs

EECS 144/244, UC Berkeley: 17

B1

Problem: Find the longest (critical) path from source s to sink f.

9

Naïve Approach: Enumerate Paths

A

How many paths in this example?In the worst case?

Cf

X

W

0

1.05.1

.2

0

0

A

2.15

2.2

2 2Zs

EECS 144/244, UC Berkeley: 18

B

Y11

2.2

Problem: Find the longest path from source s to sink f.

Algorithm 1: Longest path in a DAG

Critical Path Method [Kirkpatrick 1966, IBM JRD](dynamic programming)

Let w(u,v) denote weight of edge from u to v

Steps:

1. Topologically sort vertices (graph is acyclic)

order: v1, v2, …, vn v1 = s, vn = ?

2. For each vertex v, compute

EECS 144/244, UC Berkeley: 19

d(v) = length of longest path from source s to v

d(v1) = 0

For i = 2..n

d(vi) = maxall incoming edges (u, vi)d(u) + w(u,vi)

10

Algorithm 1: Longest path in a DAG

Critical Path Method [Kirkpatrick 1966, IBM JRD]

Let w(u,v) denote weight of edge from u to v

Steps:

1. Topologically sort vertices

order: v1, v2, …, vn v1 = s, vn = f

2. For each vertex v, compute

d( ) l th f l t th f t

Time Complexity?O(m+n)

EECS 144/244, UC Berkeley: 20

d(v) = length of longest path from source s to v

d(v1) = 0

For i = 2..n

d(vi) = maxall incoming edges (u, vi)d(u) + w(u,vi)

Run the CPM on our example

Graph vs. Circuit

Delay of a combinational circuit depends on Circuit topology (graph model)

Delay model (e.g., fixed vs. variable)

Boolean behavior of gates

We have only considered circuit topology so far.

EECS 144/244, UC Berkeley: 21

Longest/shortest path found on the graph can be very pessimistic: Paths can be FALSE

Delay values are BOUNDS

11

False Paths (consider Transition Mode)

A path is false if it cannot be responsible for the delay of a circuit

EECS 144/244, UC Berkeley: 22

Graph model implies path of length 6

False Paths

A path is false if it cannot be responsible for the delay of a circuit

EECS 144/244, UC Berkeley: 23

Graph model implies path of length 6

12

False vs. True Paths

TRUE path = one that can be responsible for the delay of a circuita circuit

Need techniques to find whether a path is TRUE or FALSE

EECS 144/244, UC Berkeley: 24

The Fixed Delay Model: Constant delay for each gate (or wire) Typo: this should be 1

EECS 144/244, UC Berkeley: 25

13

Paradoxical Behavior with the Transition Model?Typo: this should be 1

non-monotonic

EECS 144/244, UC Berkeley: 26

w.r.t. time delays!

Problems with Fixed Delay + Transition Model

1. Transition model can be tricky to reason about

Fi d d l li i d2. Fixed gate delays are unrealistic, due to manufacturing process variations

More realistic delay model: Lower and upper bounds

EECS 144/244, UC Berkeley: 27

Perform timing analysis for a whole family of circuits that share the same lower/upper bounds

14

Fixed Delays Bounded Delays

Want algorithms that report the critical path delay

of the slowest circuit in the circuit familyof the slowest circuit in the circuit family

EECS 144/244, UC Berkeley: 28

Paper: Computation of Floating Mode Delay in Combinational

Circuits: Theory and Algorithms, Devadas, Keutzer, Malik, 1993

(on bspace)

RETIMING

EECS 144/244, UC Berkeley: 29

15

Retiming Tradeoffs

aa

b 1

1

12

2

2

EECS 144/244, UC Berkeley: 30

[Shenoy, 1997]

6

4

clock period =

# registers =

Retiming Tradeoffs

aa

b 1

1

12

2

2

EECS 144/244, UC Berkeley: 31

5

4

clock period =

# registers =

[Shenoy, 1997]

16

Retiming Tradeoffs

aa

b 1

1

12

2

2

EECS 144/244, UC Berkeley: 32

clock period =

# registers =

4

3

[Shenoy, 1997]

Retiming Tradeoffs

aa

b 1

1

12

2

2

EECS 144/244, UC Berkeley: 33

clock period =

# registers =

2

4

[Shenoy, 1997]

17

Goals of Retiming

Possible goals:

• Minimize clock period (min-period retiming)

• Minimize number of registers (min-area retiming)Minimize number of registers (min area retiming)

• Minimize number of registers for a target clock period (constrained min-area retiming)

EECS 144/244, UC Berkeley: 34[Shenoy, 1997]

Abstraction:Circuit Graph

Nodes: circuit elementsNode weights: delay

EECS 144/244, UC Berkeley: 35

Dummy node for fanout

18

Abstraction:Registers?

EECS 144/244, UC Berkeley: 36

Abstraction - Registers

0

1

1

1

0

00 0

0 0 0

0

000

00

EECS 144/244, UC Berkeley: 37

Arc weights indicate registers

1

19

Abstraction - Environment

0

1

0

00 0

0 0 0

0

000

0

EECS 144/244, UC Berkeley: 38

11

1

0

Cutset – Divides the Graph in Two

0

1

0

00 0

0 0 0

0

000

0

EECS 144/244, UC Berkeley: 39

11

1

0

20

Retiming: Add a register on all arcs crossing the cutset in one direction, and subtract a register from all arcs crossing the cutset in the other direction.

0 0+1=1

1

0

00

00+1=1

0

0

000

0

0+1=1

EECS 144/244, UC Berkeley: 40

11-1=0

1-1=0

0

Recall: Retiming Tradeoffs

aa

b 1

1

12

2

2

EECS 144/244, UC Berkeley: 41

[Shenoy, 1997]

6

4

clock period =

# registers =

21

Recall: Retiming Tradeoffs

aa

b 1

1

12

2

2

EECS 144/244, UC Berkeley: 42

5

4

clock period =

# registers =

[Shenoy, 1997]

Simplest cutset surrounds one node

0

1

1-1=0

1-1=00

00

0+1=1 0

0

000

0

EECS 144/244, UC Berkeley: 43

10 0

0

22

Recall: Retiming Tradeoffs

aa

b 1

1

12

2

2

EECS 144/244, UC Berkeley: 44

5

4

clock period =

# registers =

[Shenoy, 1997]

Recall: Retiming Tradeoffs

aa

b 1

1

12

2

2

EECS 144/244, UC Berkeley: 45

clock period =

# registers =

4

3

[Shenoy, 1997]

23

For each node v, define r (v) = # of registers moved from the outputs to the inputs.

0

r (v) = -1

A retiming is now an assignment r (v) for every node v such that the weight of every arc is non-negative.

1

1-1=0

1-1=00

00

0+1=1 0

0

000

0v

y g

EECS 144/244, UC Berkeley: 46

10 0

0

Rest of the story

Retiming operations = graph transformations

Each graph has a vector of costs: (clock period, # registers, …)

Formulate and solve optimization problem.

Papers (on bspace):

EECS 144/244, UC Berkeley: 47

Papers (on bspace):1. Leiserson, C. E. and J. B. Saxe (1983). "Optimizing synchronous systems."

Journal of VLSI and Computer Systems: pp. 41-67.

2. Leiserson, C. E. and J. B. Saxe (1991). "Retiming synchronous circuitry." Algorithmica 6(1): pp. 5-35.

3. Shenoy, N. (1997). "Retiming: Theory and practice." Integration, the VLSI Journal 22: pp. 1-21.