Post on 04-Jan-2016
description
Using Software Rules To Enhance Using Software Rules To Enhance FPGA ReliabilityFPGA Reliability
Chandru MirchandaniChandru Mirchandani
Lockheed-Martin Transportation & Security SolutionsLockheed-Martin Transportation & Security Solutions
September 7-9, 2005September 7-9, 2005
P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 11
IntroductionIntroduction
To meet…To meet…• System ObjectivesSystem Objectives
Develop a Process to…Develop a Process to…• Verify FPGA CapabilityVerify FPGA Capability• Validate FPGA ReliabilityValidate FPGA Reliability• Enhance FPGA QualityEnhance FPGA Quality
By developing an Adaptive Model…….. By developing an Adaptive Model……..
……...using Software Rules…....using Software Rules….
P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 22
Problem StatementProblem Statement
Requirement: Display sensor data in near-real Requirement: Display sensor data in near-real timetime
Constraints: No loss of data, data quality & Constraints: No loss of data, data quality & integrity, and timelinessintegrity, and timeliness
Information: Uncertain…to make design decision Information: Uncertain…to make design decision with lowest risk of failurewith lowest risk of failure
Solution………Adaptive ModelSolution………Adaptive Model
P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 33
Software ReliabilitySoftware Reliability
Develop Criteria for Design Objective AcceptanceDevelop Criteria for Design Objective Acceptance
Prioritize tasks or functions in order of criticalityPrioritize tasks or functions in order of criticality
Develop metrics to measure performance of tasks Develop metrics to measure performance of tasks with respect to constraintswith respect to constraints
Evaluate design options based on measured Evaluate design options based on measured reliability metricsreliability metrics
P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 44
Typical Typical SoftwareSoftware Options Options
Critical software functions are distributed as Critical software functions are distributed as redundant instances on multiple processors, thus redundant instances on multiple processors, thus minimizing the loss of service due to a processor minimizing the loss of service due to a processor failure……..failure……..
P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 55
Processor 1
Processor 2
Application A1 (I-ary)
Application A1 (II-ary)
P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 66
Typical Software Options (contd.)Typical Software Options (contd.)
Distributing system level functions so that Distributing system level functions so that multiple users can independently use the multiple users can independently use the function…....function…....
Processor 1
Processor 2
Application B1
Application B1
P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 77
Typical Software Options (contd.)Typical Software Options (contd.)
Data replication to minimize the loss of critical Data replication to minimize the loss of critical data in the event of a processor failure or data in the event of a processor failure or software system failure….. software system failure…..
Processor 1
Processor 2
Application C1
Application C1
Storage 1
Storage 2
Redundant Instances of SoftwareRedundant Instances of Software
P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 88
Initially detect, contain and recover from faults as Initially detect, contain and recover from faults as soon as possible, and in the event this is not soon as possible, and in the event this is not possiblepossible
Allow the control to be passed on to the Allow the control to be passed on to the redundant instance within the reliability and redundant instance within the reliability and availability requirements levied on the system availability requirements levied on the system
Finally, include language defined mechanisms to Finally, include language defined mechanisms to detect and prevent the propagation of errorsdetect and prevent the propagation of errors
MethodologyMethodology
Estimate the reliability based on instruction set Estimate the reliability based on instruction set and operational usageand operational usage
Re-design critical elements to decrease riskRe-design critical elements to decrease risk
Re-evaluate the risk of failure based on a change Re-evaluate the risk of failure based on a change in critical task design based on performance and in critical task design based on performance and requirementsrequirements
Re-evaluate the reliability based on failure rateRe-evaluate the reliability based on failure rate
Factor in the Uncertainty in EvaluationFactor in the Uncertainty in Evaluation
P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 99
P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 1010
Task TimesTask Times
Task ClassTask Class StepsSteps Step Time Step Time (s(stasktask))
Task TimeTask Time Total Tasks Time (tTotal Tasks Time (ttasktask))
Reading Reading rr xxriri SSrr ssrr..xxriri (s(srr..xxrri).ni).nrr = = ttrr
Parsing Parsing pp xxpipi sspp sspp..xxpipi (s(spp..xxppi).ni).npp = = ttpp
Pre-processing Pre-processing pp11 xxp1ip1i ssp1p1 ssp1p1..xxp1ip1i (s(sp1p1..xxp1p1i).ni).np1p1 = =
ttp1p1
Monitoring Monitoring MM xxMiMi ssMM ssMM..xxMiMi (s(sMM..xxMMi).ni).nMM = =
ttMM
Sorting Sorting ss xxsisi ssss ssss..xxsisi (s(sss..xxssi).ni).nss = = ttss
Processing Processing PP xxPiPi ssPP ssPP..xxPiPi (s(sPP..xxPPi).ni).nPP = = ttPP
Post-processing Post-processing pp22 xxp2ip2i ssp2p2 ssp2p2..xxp2ip2i (s(sp2p2..xxp2p2i).ni).np2p2 = =
ttp2p2
Status-gathering Status-gathering SS xxSiSi ssSS ssSS..xxSiSi (s(sSS..xxSSi).ni).nSS = = ttSS
Writing Writing ww xxwiwi ssww ssww..xxwiwi (s(sww..xxwwi).ni).nww = = ttww
P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 1111
FPGA System - ConceptualFPGA System - Conceptual
SR
SR
SP
SP
SPP
SPP
Input Output
Consider a FPGA-based system comprising of the Consider a FPGA-based system comprising of the Reading, Parsing and Pre-Processing Tasks….. Reading, Parsing and Pre-Processing Tasks…..
……each Task is a subsystemeach Task is a subsystem
P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 1212
Task Reliability Block DiagramTask Reliability Block Diagram
Reading Reading
HW SW
Reading
CCF
Reading Reading
HW SW
[1-{1-(exp(-(1-γ[1-{1-(exp(-(1-γhh).λ).λ
shwishwi.t).exp(-(1-γ.t).exp(-(1-γss).λ).λ
sswisswi.t))}^2].t))}^2] (exp(-γ(exp(-γhh.u.uhh.λ.λhwihwi.t).exp(-γ.t).exp(-γ
ss.u.uss.λ.λswiswi.t).t)
AND OR
P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 1313
DefinitionsDefinitions
Calendar Time – τCalendar Time – τ Mission Time to Calculate the ReliabilityMission Time to Calculate the Reliability
Execution – eExecution – eii Percentage of Mission Time used by the Task (or Subsystem)Percentage of Mission Time used by the Task (or Subsystem)
Execution Time – tExecution Time – t eeii . τ . τ
Usage for SWUsage for SW Percentage of the Total software used by the TaskPercentage of the Total software used by the Task
Usage for HWUsage for HW Percentage of Area of the Active portion of the Device used by TaskPercentage of Area of the Active portion of the Device used by Task
λλshwishwi Failure Intensity of Task Failure Intensity of Task ii hardware with respect to Execution time hardware with respect to Execution time
λλsswisswi Failure Intensity of Task Failure Intensity of Task ii software with respect to Execution time software with respect to Execution time
γγhihi Fraction of Task Fraction of Task ii Task hardware that are common cause failures Task hardware that are common cause failures
γγsisi Fraction of Task Fraction of Task ii Task software that are common cause failures Task software that are common cause failures
Parameters & DerivationsParameters & Derivations
Failure Intensity: Failure Intensity: λλshwishwi = λ = λhwihwi.u.uhh.(1-γ.(1-γ
hh))
Failure Intensity: Failure Intensity: λλsswisswi = λ = λswiswi.u.uss.(1-γ.(1-γ
ss))
Common Cause:Common Cause: λλhwihwi.u.uhh.(γ.(γhh) and λ) and λ
swiswi.u.uss.(γ.(γss))
Execution Time Execution Time tt:: eeii . Τ . Τ
RSSi : Subsystem ReliabilitySubsystem Reliability
System Reliability RSystem Reliability RS :S : RRSS1 SS1 .. RRSS2 SS2 .. RRSS3SS3
P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 1414
ReadingReading ParsingParsing Pre-ProcessingPre-Processing
Usage SW - uUsage SW - uss 0.30.3 0.30.3 0.40.4
Usage HW - uUsage HW - uhh 0.30.3 0.40.4 0.30.3
λλhwihwi 0.30.3 0.40.4 0.30.3
λλswiswi 0.30.3 0.40.4 0.30.3
Execution - eExecution - eii 0.20.2 0.10.1 0.70.7
P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 1515
System Configuration OptionsSystem Configuration Options
ConfigurationConfiguration HW Common Cause FractionHW Common Cause Fraction SW Common Cause FractionSW Common Cause Fraction
γγhh γγss
SameSame Code & Device Code & Device 0.010.01 11
SameSame Code & Code & DiffDiff Devices Devices 0.00250.0025 0.99750.9975
DiffDiff Code & Code & SameSame Device Device 0.010.01 0.50.5
DiffDiff Code & Devices Code & Devices 0.00250.0025 0.10.1
ResultsResults
P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 1616
OptionOption ConfigurationConfiguration FPGA-based System ReliabilityFPGA-based System Reliability
11 Same Code, Same DevicesSame Code, Same Devices 0.8957265640.895726564
22 Same Code, Diff DevicesSame Code, Diff Devices 0.8959738150.895973815
33 Diff Code, Same DevicesDiff Code, Same Devices 0.9447525790.944752579
44 Diff Code, Diff DevicesDiff Code, Diff Devices 0.983561250.98356125
ConclusionsConclusions
Cost and Schedule SlipsCost and Schedule Slips
Development Delays and CostsDevelopment Delays and Costs
Adaptive ModelAdaptive Model
Optimization and Design ConstraintsOptimization and Design Constraints
Contact Address: chandru.j.mirchandani@lmco.comContact Address: chandru.j.mirchandani@lmco.com
P226/MAPLD2005P226/MAPLD2005MIRCHANDANIMIRCHANDANI 1717