Post on 04-Feb-2016
description
Cost-Effective Register File Soft Error reduction
Pablo Montesinos, Wei Liu and Josep Torellas,University of Illinois at Urbana-Champaign
Overview
Study of register file vulnerability to SDC(Silent Data Corruption)
Shield – cost effective protection to register files
Highighting policies and techniques used in shield
Experiment - Results
Register File AVF RF-AVF is the probability that a fault that occurs will
lead to error. Register lifetime is divided into PreWrite, Useful,
and PostLastRead parts.
Based on AVF calculation we can divide lifetime of bit into ACE (Architecturally Correct Execution) and un-ACE cycles.
Register File AVF
During PreWrite Period – un-ACE If used atleast once after write the reg
switches to ACE state. After last read on reg, switches back
to un-ACE during PostLastRead
Highlighting Insights (1)
The combined %-USEFUL time of all registers is small
Highlighting Insights (1) The average number of useful (live) registers is less
than 20 (SPECint) and 17(SPECfp).
It is thus possible to redue the vulnerability of the register file by only protecting a subset of carefully chosen registers at a time.
Highlighting Insights (2) Only a few long-lived registers contribute to
overall Total useful time
On average less than 10% of register versions are long-lived.
Highlighting Insights (2)
On average 40% of useful time comes from the few long-lived versions.
In SPECfp, 5% of long-lived versions account for 46% of the useful time.
Motivation
Register files have a very high access rate.
High temperature thus leading to lesser Qcrit for the devices.
An error in an RF can propagate with hght failure probability
If we isolate a few register versions, predicting their life-time, and protect these register versions alone, high reliability can be achieved with limited overhead.
Shield - Architecture
Life-Time Prediction
Shielding Decision
Register Error Check
Error Recovery
Reg-Version Lifetime Prediction
P12 => Used(1) , Renamed(1)
P7 => Used(0) , Renamed(1)
Shielding Decision These prediction bits are stored as status in the ECC
table. The decision to shield an incoming register version
written is by: Availability of free ECC-Table entry Same register# present in the ECC table will be replaced
with new entry. Existing reg-version with lesser lifetime than incoming reg-
version will be replaced. Replacement policy:
Register Error Check & Recovery On a read request the register data is sent
to the original datapath and shield. If the Reg# matches with a tag entry, then
the reg-data is checked for errors at the ECC-Checker.
If Error is detected Processor stalls the instruction I reading reg P Reg-data is corrected and written into RF Oldest read instruction reading reg P in ROB and
all succeeding instructions is flushed. Processor resumes from flushed instruction.
Experiments- Results
AVF computation for RF with shield
Experiments-Results
AVF of intREG reduced by different replacement policies: LRU = 31% Effective = 63% OptEffective = 84% ( pinning of global pointers to
particular ECC entries + Effective )
AVF for fpREG can be reduced maximum by 100%, because fewer fp-registers are in useful state.
Power and Area Impact
Shield only uses 3ECC generators and 3 ECC checkers.
Shield has 45% power overhead over a plain register file. (Full ECC has 2X)
Shield introduces an overall 10% area overhead.
Conclusion
A cost-effective architectural technique has been proposed to reduce the vulnerability of RF by 84%
The area and power overhead indicated is a marginal tradeoff for reliability achieved.