Penelope: The NBTI-Aware Processor · PDF filePenelope: The NBTI-Aware Processor ... Set...
Transcript of Penelope: The NBTI-Aware Processor · PDF filePenelope: The NBTI-Aware Processor ... Set...
Penelope: The NBTI-Aware ProcessorXavier Vera
Jaume Abella, Antonio Gonzalez
CTG/MTL Intel Barcelona Research Center – Intel Labs UPC
3/12/2007
MICRO 40, Chicago
CTG/MTL IBRCCTG/MTL IBRC 2
Purpose
NBTI is one of the main sources of failure affecting transistors• NBTI affects PMOS transistors when voltage at the gate is negative• PBTI (affecting NMOS) is gaining importance• FMAX and Vmin are impacted• NBTI can be mitigated controlling different parameters, such as operating voltage,
temperature, geometry and duty cycle
We propose a set of microarchitectural mechanisms to manage inputs and contents of blocks so that duty cycle of
PMOS is lowered
CTG/MTL IBRCCTG/MTL IBRC 3
Agenda
•Understanding NBTI and its impact
•Solutions for sequential blocks
•Solutions for combinational blocks
•Conclusion
CTG/MTL IBRCCTG/MTL IBRC 4
NBTI Degradation
NBTI affects PMOS transistors when voltage at the gate is negative: Si-H bonds break• More traps (NIT) in the interface make the transistor slower
Source: M.A. Alam, “On Reliability of Microelectronic Devices: An Introductory Lecture on Negative Bias Temperature Instability”, Sept. 2005
Hydrogen leaves the interface
NIT grows, which means slower
transistor
CTG/MTL IBRCCTG/MTL IBRC 5
IN T R egister file b it b ias
0%
50%
100%
bits
bias
Impact of NBTI
PMOS transistors degrade only when they have a “0” at their gates • High bias of data
• Memory cells suffer higher bias because of their design
• Worst bit determines lifetime
Memory cell
CTG/MTL IBRCCTG/MTL IBRC 6
NBTI: Current Solutions
Current solution is guardbanding• Vmin is increased ~10%. Higher power dissipation because of higher Vmin
• FMAX is reduced ~10-20%
Sources:
W. Abadeer, W. Ellis. Behavior of NBTI under AC Dynamic Circuit Conditions. In IRPS 2003.
M. Agostinelli et al. Erratic Fluctuations of SRAM Cache Vmin at the 90nm Process Technology Node. In IEDM 2005.
CTG/MTL IBRCCTG/MTL IBRC 7
Mitigating the NBTI Problem
NBTI can be mitigated controlling different parameters• Operating voltage: lower voltage means lower NBTI
• Temperature: lower temperature means lower NBTI
• Geometry: wider transistors suffer lower NBTI
• Duty cycle: lower fraction of time with “0” at the gate means lower NBTI
Voltage, temperature and geometry impact delay, power and area. Duty cycle is easy to manage with microarchitectural techniques
CTG/MTL IBRCCTG/MTL IBRC 8
Agenda
•Understanding NBTI and its impact
•Solutions for sequential blocks
•Solutions for combinational blocks
•Conclusion
CTG/MTL IBRCCTG/MTL IBRC 9
Managing Contents in Storage
Lowest degradation achieved when both PMOS degrade the same• Hence, we want to store “0” and “1” 50% of the time each
Keep contents inverted 50% of the time in such a way that perfect balancing is achieved• Data is highly biased
“0” “0”“1” “1”
“0” “1”“1” “0”
50% of the time inverted
t
Memory cell
CTG/MTL IBRCCTG/MTL IBRC 10
Choosing What to InvertExtra gate fits cycle time?
(De)invertperiodically
Do we storeinvalid entries?
Invert invalid entries
Invalidate and invert
CTG/MTL IBRCCTG/MTL IBRC 11
Case 1: (De)Invert Periodically
Two operating modes: non-inverted and inverted
50% of the time in each mode
DL0
Current Mode
Registerfile
Mode
non-inverted 0
inverted 1
Current Mode
Source: Wilkerson et al, CTG/MTL
CTG/MTL IBRCCTG/MTL IBRC 12
Case 2: Inverting Invalid Contents
Characteristics:• 50% of the storage contents are inverted at any time
• In the long run all entries will spend 50% of the time inverted
Low overhead• Actual ports are used (no extra ports required)
• Some extra logic is required (off the critical path)
• Cycle time, TDP and area are roughly the same
CTG/MTL IBRCCTG/MTL IBRC 13
Inverting Invalid Contents: Register File
Observation: registers spend more than 50% of the time with invalid contents• Contents are invalid since they are released until they are written again
We invert register’s contents when they are released
RINV
Every k cycles sample
Data
Registerfile
If port not busy
Tag released?
CTG/MTL IBRCCTG/MTL IBRC 14
Results for Integer Register File
Guardband in FMAX can be smaller (higher performance) and Vmin can be reduced (lower power)• Bias without inverting: 96% (46% from the optimal)
• Bias inverting: 53.5% (3.5% from the optimal)
Integer Register File bit bias
0%
50%
100%
1 2 3 4 5 6 7 8 9 10 11 1213 14 1516 1718 19 2021 22 2324 25 2627 28 2930 31 32
bit number
bias
No invertInvert
CTG/MTL IBRCCTG/MTL IBRC 15
Inverting Invalid Contents: Scheduler
Entries are busy more than 50% of the time• Some entries are self balanced because they store tags
• Entries that are busy < 50%, like the register file
• Entries busy > 50%, inverting is not enough. – If “0” (“1”) most of the time, write “1” (“0”) when idle
Some entries are too biased or too busy and perfect balancing cannot be achieved• Lower benefits in terms of guardband
CTG/MTL IBRCCTG/MTL IBRC 16
Results for Scheduler
Worst bit bias reduces from 100% to 60% (ideally we want 50%)
Scheduler bit bias
0%
50%
100%
bits
bias
No invertInvert
CTG/MTL IBRCCTG/MTL IBRC 17
Case 3: Invalidate & Invert: Cache Structures
HOW/WHAT TO INVERT
Any entry may be useful in caches• If an entry is inverted, we need to invalidate
it
Evict likely-dead entries (most of them are dead)• Those close to the LRU are unlikely to be
reused
WHEN TO INVERT
Keep 50% entries inverted at any time• Easy to implement
• May lose some performance
Invert more than 50% when no performance loss is expected, and less than 50% otherwise• A bit more complex
• Has fewer glass-jaws
CTG/MTL IBRCCTG/MTL IBRC 18
Results for DL0 and DTLB
Near-optimal balancing is achieved
Benefits in terms of both FMAX and Vmin overcome performance loss (most of the FMAX guardband can be removed)
DTLB performance loss
0,0%
0,5%
1,0%
1,5%
2,0%
128ent 8-way 64ent 8-way 32ent 8-way
50% always
60% + disable
DL0 performance loss
0,0%
0,5%
1,0%
1,5%
2,0%
32KB 8-way 16KB 8-way 8KB 8-way
50% always
60% + disable
CTG/MTL IBRCCTG/MTL IBRC 19
Agenda
•Understanding NBTI and its impact
•Solutions for sequential blocks
•Solutions for combinational blocks
•Conclusion
CTG/MTL IBRCCTG/MTL IBRC 20
Managing Inputs for Combinational Logic
During idle periods inputs remain the same• The same PMOS degrade
If different inputs are used, different PMOS will degrade • None of the PMOS will degrade too much
• Maximum duty cycle is reduced
Set special inputs during idle periods to reduce maximum degradation• Alternate different inputs in a round-robin fashion during idle periods
More details in the paper
CTG/MTL IBRCCTG/MTL IBRC 21
Results for an Adder
Inputs have been chosen to degrade different PMOS
Few inputs are enough to balance the degradation
NBTI Guardband
0%
4%
8%
12%
16%
20%
24%
real inputs 30% real + 000 +111
21% real + 000 +111
11% real + 000 +111
inputs
CTG/MTL IBRCCTG/MTL IBRC 22
Agenda
•Understanding NBTI and its impact
•Solutions for sequential blocks
•Solutions for combinational blocks
•Conclusion
CTG/MTL IBRCCTG/MTL IBRC 23
Conclusion
Whole processor protected with minor modifications
Sequential blocks• Non-critical access time: invert periodically all contents
• Critical access time: invert invalid contents
Combinational blocks• Set special inputs during idle periods
Guardbanding may be reduced• FMAX is higher performance is boosted; Vmin is lower power is reduced
Future work: take a look to NMOS (PBTI)