March 28, 20071 Glitch Reduction for Altera Stratix II devices Tomasz S. Czajkowski PhD Candidate...
-
Upload
elmer-wilkerson -
Category
Documents
-
view
215 -
download
0
description
Transcript of March 28, 20071 Glitch Reduction for Altera Stratix II devices Tomasz S. Czajkowski PhD Candidate...
March 28, 2007 1
Glitch Reduction for Altera Stratix II devices
Tomasz S. CzajkowskiPhD CandidateUniversity of Toronto
Supervisor: Professor Stephen D. Brown
March 28, 2007 2
Outline
Motivation Power Model Glitch Reduction Algorithm Results Conclusion
March 28, 2007 3
Motivation
Glitches: Undesirable logic transitions that occur
due to delay imbalance in the logic circuit
Waste power and do not provide any useful functionality
Can increase the average toggle rate of a net by as much as a factor of 2
Glitches can be filtered out by strategically inserting negative edge triggered FFs
March 28, 2007 4
Glitches in FPGAs
Due to unequal arrival time of signals at the inputs of LUTs
Glitches can be propagated through LUTs
4LUT
4LUT
Generated
Propagated
March 28, 2007 5
Reducing Glitches
Insert a negative edge triggered FF after a LUT that produces or propagates glitches
4LUT
4LUT
Generated
clock
No glitches
March 28, 2007 6
Alternatives
Gated D-latch Implement a gated D-latch in a LUT Input signal is transparent during the latter half of
the clock period Gated LUT
Gate the output of a LUT with the clock input using an AND or an OR gate
Similar effect as gated D-latch Can generate glitches too
When implemented Gated D-latch consumes 50% more power than a FF
and double that of a gated LUT Neither alternative is very effective
March 28, 2007 7
Background on Dynamic Power
Average Net Dynamic Power Dissipation
Pavg is average power V is supply voltage fclock is the clock frequency si is the average per cycle toggle rate of a net Ci is the capacitance of a net
1#
0
2 ** 21 nets
iiclockiavg CfsVP
March 28, 2007 8
Power Model
Goal To be able to compute the change in
dynamic power dissipation in the logic elements affected by a negative edge triggered FF insertion
Power dissipated by a LUT and a FF Toggle Rate of logic signals (si) Net capacitance (Ci)
March 28, 2007 9
LUT Power
The LUT itself dissipates an non-trivial amount of power when its inputs toggle
We look at how the power dissipated by a LUT relates to the frequency of its output transitions
March 28, 2007 10
LUT Power Model
March 28, 2007 11
FF Power
How much power would it cost to insert a FF into a circuit?
What about the power cost of alternatives to a FFs? Gated LUT Gated D-latch
March 28, 2007 12
Clocked Element Power Comparison
March 28, 2007 13
Wire Properties
Name Description Notation
Static ProbabilityProbability that a wire assumes the logic value 1 in any given clock cycle. P[y]
Transition Probability
The average number of state transitions, excluding glitches. Pt(y)
Low to High Transition Probability
Probability that a wire will change state to logic value 1, given that it is at a logic value 0 at present. P[y’=1 | y=0]
High to Low Transition Probability
Probability that a wire will change state to logic value 0, given that it is at a logic value 1 at present. P[y’=0 | y=1]
Transition Density The average number of logic value transitions per cycle. Includes glitches. D(y)
Average Number of Glitches per cycle
The average number of useless transitions per clock cycle D(y)-Pt(y)
March 28, 2007 14
Examples of Wires
P[y] Pt(y) P[y’=1 | y=0]
P[y’=0 | y=1] D(y)
D(y) –
Pt(y)
½ 1 1 1 1 0
½ ½ ≈0.4 ≈0.4 ½ 0
1/8 ¼ 1/8 1 ¼ 0
1/8 ¼ 1/8 1 ½ ¼
Clock
A
B
C
D
March 28, 2007 15
Example 1
x1
x2y
Name P[y] Pt(y) P[y’=1 | y=0]
P[y’=0 | y=1] D(y)
x1 ½ ½ ½ ½ ½
x2 ½ ½ ½ ½ ½
1
01
2 Initial statex1x2
Final statex’1x’2
# Transitions on y(Trans(x1x2,x’1x’2))
0000 001 010 011 1
0100 001 010 211 1
1000 001 010 011 1
1100 101 110 111 0
March 28, 2007 16
Static Probability
Let y = f(x1,x2)=x1∙x2
41][][),(][
1
0
1
021
a b
bxPaxPbafyP
March 28, 2007 17
Probability of a specific Transition
Compute the probability of a specific transition by using the static probability, 1→0 and 0→1 transition probability of each wire
161)
211(*)
211(*
21*)
211(
])0|1[1(*])[1(*]0|1[*])[1(]0|0[*]0[*]0|1[*]0[
]10 00[
222111
222111
21to
21
xxPxPxxPxPxxPxPxxPxP
xxxxP
March 28, 2007 18
Transition Probability
83] [),(),()(
11
00
11
0021
to212121
21 21
xx xx
t xxxxPxxfxxfyP
Initial statex1x2
Final statex’1x’2
# Transitions on y(Trans(x1x2,x’1x’2))
0000 001 010 011 1
0100 001 010 211 1
1000 001 010 011 1
1100 101 110 111 0
March 28, 2007 19
Transition Density
21] [),()(
11
00
11
0021
to212121
21 21
xx xx
xxxxPxxxxTransyD
Initial statex1x2
Final statex’1x’2
# Transitions on y(Trans(x1x2,x’1x’2))
0000 001 010 011 1
0100 001 010 211 1
1000 001 010 011 1
1100 101 110 111 0
March 28, 2007 20
0→1 Transition Probability
41
][1
] [),(),(]0|1[
11
00
11
0021
to212121
21 21
yP
xxxxPxxfxxfyyP xx xx
Initial statex1x2
Final statex’1x’2
# Transitions on y(Trans(x1x2,x’1x’2))
0000 001 010 011 1
0100 001 010 211 1
1000 001 010 011 1
1100 101 110 111 0
March 28, 2007 21
1→0 Transition Probability
43
][
] [),(),(]1|0[
11
00
11
0021
to212121
21 21
yP
xxxxPxxfxxfyyP xx xx
Initial statex1x2
Final statex’1x’2
# Transitions on y(Trans(x1x2,x’1x’2))
0000 001 010 011 1
0100 001 010 211 1
1000 001 010 011 1
1100 101 110 111 0
March 28, 2007 22
Properties of wire y in Example 1
Name P[y] Pt(y) P[y’=1 | y=0]
P[y’=0 | y=1] D(y)
y ¼ 3/8 ¼ ¾ ½
x1
x2y
1
01
2
March 28, 2007 23
Example 2
Name P[y] Pt(y) P[y’=1 | y=0]
P[y’=0 | y=1] D(y)
x3 ½ ½ ½ ½ ½
y ¼ 3/8 ¼ ¾ ½
x1
x2y
1
01
2
x3z
31 4
March 28, 2007 24
Computing Properties of wire z
Same computations as in Example 1.
Increase D(z) to account for glitches that occur on wire y (Dglitch(z)). Do so only when x3 remains at constant 1 for the duration of the clock cycle.
321
81*
41
83
21*
21*
21
))()((*]1|1[*]1[)( 333
yPyDxxPxPzD tglitch
March 28, 2007 25
Minimum Pulse Width
When using the table to compute # of transition on a wire given initial and final state of LUT inputs we can compute intermediate transitions and their duration
Some intermediate pulses will be too short to cause a full logic change at the logic output
This parameter depends on the target device used
We remove those pulses from computation Any pulse with duration less than .25ns is removed
March 28, 2007 26
Estimate Error
March 28, 2007 27
Particular Example: mux64_16bit
March 28, 2007 28
Particular Example: des_perf_opt
March 28, 2007 29
Particular Example: cf_fir_24_8_8
March 28, 2007 30
Particular Example: huffman
March 28, 2007 31
Net Capacitance
We need to be able to estimate net capacitance to figure out the difference in dynamic power dissipation due to a change in the transition density of a net
Relate net capacitance (unavailable directly) to net delay (available through timing report) Distinguish between nets of different fanout
March 28, 2007 32
Fanout 1 Net Capacitance
March 28, 2007 33
Fanout 2 Net Capacitance
March 28, 2007 34
Fanout 3 Net Capacitance
March 28, 2007 35
Fanout 4 Net Capacitance
March 28, 2007 36
Higher Fanout Net Capacitance
In our benchmark set fewer than 5% of the nets had fanout greater than 4 Clock net is excluded from calculation
Approximate capacitance of net with fanout n>4 as:
Not exact, but supports the fact that glitches on nets with high fanout are bad Average estimate error of +22%
)4mod()4(*4
)( nCCnnC
March 28, 2007 37
Algorithm
1. Scan all nets in a logic circuit to determine if negative edge FF insertion can be applied
2. Analyze the resulting set of nets to determine the benefit of applying the optimization to each net (determined by the cost function)
3. Apply the optimization to a net on which the most power could be saved
4. Repeat until no beneficial choices are found
March 28, 2007 38
Compute change in power (∆P) + cost of adding a FF - power saved on the modified net - power saved on nets and LUTs in the
transitive fanout of the added FF Compute the change in the minimum clock
period (∆T) Specify ∆T allowed (∆Ta)
where u(x) is the step function Accept change when ∆C < 0
Cost Function
)(1* TTuPC a
March 28, 2007 39
Example
LUTSome logic
network
LUT
LUT
LUT
LUT
LUT FF
FF
FF
March 28, 2007 40
Example: Inserted FF
LUTSome logic
network
LUT
LUT
LUT
LUT
LUT FF
FF
FF
NegFF
March 28, 2007 41
Example: Compute change in the # of glitches
LUTSome logic
network
LUT
LUT
LUT
LUT
LUT FF
FF
FF
NegFF
March 28, 2007 42
Example: Compute change in the # of glitches
LUTSome logic
network
LUT
LUT
LUT
LUT
LUT FF
FF
FF
NegFF
March 28, 2007 43
Example: Compute change in LUT power dissipation
LUTSome logic
network
LUT
LUT
LUT
LUT
LUT FF
FF
FF
NegFF
March 28, 2007 44
Experimental Results 8 benchmark circuits taken from QUIP package Synthesize, place, route and analyze timing of a circuit
using Quartus II 5.1 Apply algorithm to reduce glitches in a circuit
Aim to decrease the minimum clock period by no more than 5%
Perform timing analysis once the circuit has been modified
Use ModelSIM-Altera 6.0c for simulation Simulate a circuit both pre- and post- modification
using the same clock frequency Use PowerPlay Power analyzer to estimate the average
dynamic power dissipation of each circuit
March 28, 2007 45
Experimental Results
Circuit nameSimulation
Clock Frequency
(MHz)
Minimum Clock Period Dynamic Power Dissipation
Initial(ns)
Final(ns)
Change (%)
Initial (mW)
Final (mW)
Change (%)
Barrel64* 200 4.386 4.806 8.74 229.94 189.7 -17.50
mux64_16bit 275 3.052 3.052 0 389.24 389.24 0.00
fip_cordic_rca 125 7.551 7.851 3.82 43.28 39.49 -8.76
oc_des_perf_opt 290 2.989 3.07 2.64 1058.8 796.7 -24.75
oc_video_compression_systems_huffman_enc 260 3.626 3.626 0 94.88 95.19 0.33
cf_fir_24_8_8 170 5.375 5.71 5.87 290.41 292.9 0.84
aes128_fast 140 6.251 6.569 4.84 879.24 870.6 -0.99
rsacypher 140 6.376 6.563 2.85 50.73 48.22 -4.95
Average +3.6 -7.0
March 28, 2007 46
Observations (1)
oc_des_perf_opt Large number of XOR gates present Removing glitches from one node removes a
lot of glitches on the nodes in its transitive fanout (up to the next FF)
mux64_16bit The cost function determined that no net was a
good candidate for optimization Very few glitches were present in the circuit
and the power they dissipate was not large enough to warrant the insertion of FFs
March 28, 2007 47
Observations (2) cf_fir_24_8_8
Overestimated toggle rate caused the algorithm to apply negative edge triggered FF insertion too excessively
Need to include spatial correlation in the toggle rate model aes128_fast
Toggle rate is 50% higher than in oc_des_perf_opt Most nets use local LAB connections, causing little power
dissipation Insertion of 173 FFs only achieved 1% power reduction
Saved 35.14 mW in routing alone, because toggle rate on all affected wires was reduced by 50-70%
Added 24.6 mW due to FF insertion Added 1.86 mW to the power dissipated by the clock network,
because new LABs were connected to the clock network Net win of 8.68 mW
March 28, 2007 48
Conclusion
Negative edge triggered FF insertion can work well to reduce glitches in a circuit
Unlike retiming, our approach only needs to ensure that exactly one negative edge triggered FF is on any given combinational path Retiming may require the translation of
more than a single FF to be valid
March 28, 2007 49
Future Work
Better toggle rate prediction algorithm that includes spatial correlation
Having FFs that can be negative edge triggered without using an additional LAB clock line would make the cost of this optimization lower Silicon area cost vs. frequency of use trade-off
March 28, 2007 50
Acknowledgement
We’d like to express our gratitude to Altera for funding this research
We’d like to thank Altera Toronto in particular for dedicating some of their time to answer our questions and provide insight throughout the course of this work
March 28, 2007 51
Questions?