1 Dynamic and Leakage Power Reduction in MTCMOS Circuits Using an Automated Efficient Gate...
-
Upload
zaire-benton -
Category
Documents
-
view
214 -
download
0
Transcript of 1 Dynamic and Leakage Power Reduction in MTCMOS Circuits Using an Automated Efficient Gate...
1
Dynamic and Leakage Power Reduction in MTCMOS
Circuits Using an Automated Efficient Gate Clustering
Technique
Mohab Anis, Shawki Areibi *, Mohamed Mahmoud and Mohamed Elmasry
VLSI Research Group, University of Waterloo, Canada
* School of Engineering, University of Guelph, Canada
2
Presentation Outline
• Low Power Design in DSM
• Concept of sleep transistors
• Previous work
• Sizing the sleep transistor
• Bin-Packing technique
• Set-Partitioning technique
• Conclusion and extended work done
3
Why Low Power Design ?
• Growing market of mobile and handheld electronic systems.
• Difficulty in providing adequate cooling. Fans create noise and add to cost.
• Heat dissipation impacts packaging technology and cost
• Increasing standby time of portable devices.
In DSM regimes, leakage power has become as big a problem as dynamic power
4
Concept of sleep transistors
VX
SLEEP HVT
LVT Logic Block
VX
LVT Logic Block
R I
Modeling of a sleep transistor as a resistor
MTCMOS technology is an increasingly popular technique to reduce leakage power
Proper ST sizing is a key issue
ST size Area , Pdynamic , Pleakage
ST size Delay
5
First Approach [1]
Single ST to support whole circuit
Increase in interconnect resistance for distant blocks
ST size to compensate added
resistance Area Pdynamic Pleakage
More significant in the DSM regime
[1] S.Mutah et al. “1-V Power Supply High-Speed Digital Circuit Technology with Multi-Threshold Voltage CMOS,” IEEE J. of Solid-State Circuits, pp.847-853, 1995.
SLEEP HVT
LVT Logic Circuit
6
Second Approach [2]Single ST is sized according to a mutual
exclusive discharge pattern algorithm.
ST assignments are wasteful.
Increase in interconnect resistance for
distant blocks. ST size to compensate
added resistance.
Pdynamic Pleakage
More significant in the DSM regime.
G1
G9G7
G8G6
G4
G2
G3
G5
G10
[2] J.Kao et al. “MTCMOS Hierarchical Sizing Based on Mutual Exclusive Discharge Patterns”, in Proc. of 35 th DAC, pp. 495-500, Las Vegas, 1998
7
Sizing the sleep transistor
• Objective: Constant ST size, causing 5% degradation in circuit speed.
• (W/L)sleep = Isleep
0.05 n Cox (Vdd-VtL)(Vdd-VtH)
Isleep is chosen to be 250 A.
(W/L)sleep 6 for 0.18 m CMOS technology
VtL = 350mV, VtH = 500mV
8
4-bit CLA Adder
9
Preprocessing of Gate CurrentsRandom I/Ps to CLA adder are
applied, highest current discharge is monitored, and multiplied by
corresponding switching activity
Monitor the peak current value and time of occurrence +
duration
Currents are combined into single current Ieq = max{Ii}, when Ii in time max{Ii}
10
Timing Diagram
T1=80psec
T1+T2=210psec
79
65
260psec
120psec
0 0 11 22 33 43 54 65 54 43 33 22 11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 6 12 18 24 30 37 43 49 55 61 67 73 79 73 67 61 55 49 43 37 30 24 18 12 6 0 0 0 0 0 0 0
I1 (G1):
I2 (G2):
I1 (G1)
I2 (G2)
T1
T2
G1
G2
F0=2
F0=4
time
time
11
Preprocessing Heuristic
1. Initialize current vectors2. Set all Gates free; to move to sub-cluster;3. For all gates in circuit If gate i is not clustered yet assign gate i to new cluster k update cluster current vector calculate max current, start, end time For all other gates in circuit If (gate j is not clustered yet) add current of gate j to cluster k If (combination max current) append gate to cluster update cluster info set gate j locked in cluster k End For End For4. Return all clusters formed.
12
Bin-Packing Technique
Objective: Minimize the No. of used STs.
Subject to: 1. Ieq Imax for any ST.
2. Ieq are assigned only once.
13
Currents Assignment
240250 Currents (A)
G1 G2 G3 G4 G9 G10 G11 G12 G13 G15 G17 G19 G20 G21 G22 G24 G25 G26 G27 G28
G5 G6 G7 G8 G14 G16 G18 G23
Assigned Gates
IEQ1 IEQ2
IEQ5 IEQ6
IEQ3 IEQ4
IEQ7
Equivalent Currents
21Sleep Transistors
14
Clustering of CLA adder
15
Set-Partitioning Technique
Ground railSleep Device cavity
Cell
Vdd
gnd
Vdd
gnd
CellHeight
G1 G3G2 G5G4 G7G6 G8
G9G19 G11G10 G14G13 G16G15 G17
G24
G18G12
G22G26G21G25G20G23G27 G28
Lmin
16
Cost FunctionCj = ( w1 . Cj1
) + ( w2 . Cj2 )
Cj1 = Sleep_Transistor max_current - currenti i
Cj2 = duv in a group Sj
Gv
Gu
Gw
duv
dwu
dvw
Sj
17
Clustering HeuristicCreate_Clusters ( )1. Calculate distances between all gates;2. Initialize maxgates_per_cluster=n;3. Create clusters with Single gates;4. For cl=2; cl maxgates_per_cluster Create_n_Gate_Cluster (cl)5. For all clusters created calculate_cost ( )
Create_n_Gate_Clusters (cl)1. For cluster of type cl create_new_cluster ( ) While not done Choose Gate with minimum distances If sum of currents capacity append gate to newly created cluster End If If total gates within cluster limit break; End While End For2. Return newly created cluster
18
Set-Partitioning Technique
• Objective: Minimize CjSj
• Subject to: 1. of currents for Sj Imax
2. Groups must cover all gates
with no repetition.
19
Grouping of gates
Ground railSleep Device cavity
Cell
Vdd
gnd
Vdd
gnd
CellHeight
G1 G3G2 G5G4 G7G6 G8
G9G19 G11G10 G14G13 G16G15 G17
G24
G18G12
G22G26G21G25G20G23G27 G28
Lmin
20
Computational Time
BP/SP CPU TIME
-2000
200400600800
100012001400160018002000
28 30 31 61 160 204
Number of Gates
Tim
e (s
ecs)
SP CPU Time BP CPU Time
21
2 %
0 %
98 %
77 %
98, 76 %
9 %
8 %
87 %
71 %
86, 70 %
11 %
8 %
86 %
66 %
86, 67 %
19 %
9 %
85 %
35 %
85, 34 %
9 %
6 %
85 %
70 %
84, 69 %
7 %
5 %
87 %
78 %
87, 77 %
Pdynamic to [1]
Pdynamic to [2]
Pleakage to [1]
Pleakage to [2]
ST_Area [1],[2]
SP
2 %
0 %
99 %
89 %
99, 88 %
20 %
19 %
95 %
89 %
95, 89 %
17 %
14 %
93 %
83 %
93, 83 %
31 %
23 %
95 %
78 %
95, 78 %
18 %
16 %
92 %
85 %
92, 85 %
14 %
12 %
96 %
93 %
95, 92 %
Pdynamic to [1]
Pdynamic to [2]
Pleakage to [1]
Pleakage to [2]
ST_Area [1],[2]
BP
16020261303128No. of gates
27-channel interrupt controller
C432
32-bit Single Error
Correcting C499
4-bit 74181 ALU
6-bit Multiplier
32-bit Parity
Checker
4-bit CLA
adder
BenchmarkREF
Results (% Savings)
22
% Power Savings (Bin-Packing)
010
203040
5060
708090
100
CLA Parity Mult ALU Error C432
Benchmarks
Pdyn/1 Pdyn/2 Pleak/1 Pleak/2
23
% Power Savings (Set-Partitioning)
0
20
40
60
80
100
CLA Parity Mult ALU Error C432
Benchmarks
Pdyn/1 Pdyn/2 Pleak/1 Pleak/2
24
% ST Area Saving (Bin-Packing)
0
20
40
60
80
100
CLA Parity Mult ALU Error C432
Benchmarks
St-Area[1] St-Area[2]
25
% ST Area Saving (Set-Partitioning)
0
20
40
60
80
100
CLA Parity Mult ALU Error C432
Benchmarks
St-Area[1] St-Area[2]
26
Conclusion
• BP technique cluster gates in MTCMOS circuits. Pdynamic and Pleakage are reduced by 15% and 90% compared to [1] and [2] respectively.
• SP takes routing complexity into consideration. Pdynamic and Pleakage are reduced by 11% and 77% compared to [1] and [2] respectively.
27
Extended Work Done
• A hybrid clustering technique that combines the BP and SP techniques is devised, to produce a more efficient and faster solution.
• Noise associated with ground bounce is taken as taken as a design criterion (< 50mV).
• Investigating effect of different ST sizes on circuit parameters.
• Investigating effect of the cost function weights w1 and w2 on circuit parameters.