1
Techniques for automated localization and correction of
design errors
Jaan RaikTallinn University of Technology
2
Design error debug
“There has never been an unexpectedly short debugging period in the history of computers.”
Steven Levy
3
Designs are getting bigger
4
25-30 % annually decreasing cost per function
15 percent annual growth of the market for IC
• But …
The cost of chip design keeps on growing.
• In 1981, development of a leading-edge CPU cost 1 M$
• …today it costs more than 300 M$ !!!
• Why do the costs increase ???
Designs are getting costlier
5
Design automation crisis
• productivity gap– 58% versus 21% annually
transistorson the die
Tehnology’scapabilities
Designer’s productivity
todaytime
40 60 70
30 40 2
30 2
3 2
System design
Logic design
Physicaldesign
Simulation Schematic entry
Placement &routing
Hierarchy,generators
Logic synthesis
High-level synthesis /System-level synthesis
Specialized high-level synthesis
< 1979
~ 1983
1986
1988-92
1992-95
~1996-...
Person months /20 000 logic gates
10 2
30 50 2
6
Verification and debugging
• Debug = Localization + Correction• ~2/3 of development time for verification• ~2/3 of verification time for debug• Thus nearly half of the development cycle
Specify Design Detect Localise Correct
VerificationDebugDevelopment time:
Bugs are getting „smarter“
7CREDES Summer School, June 2-3, 2011, Tallinn, Estonia
Traditional debug flow
8
Verification
Design
Spec
Error!
Counter-examples (waveforms), failed assertions, ...
???
• Too much information• Too little information
Automated debug flow
9
Verification
Design
Spec
Error!
Corrected design, Repair log, ...
Error localization
Error correction
Outline
• Verification basics
• Automated debug at the gate-level
• RTL debug methods– Localization: SAT; correction: resynthesis– Localization: path tracing; correction: mutation
• General discussion, future trends
• Prototype tools, on-going activities
CREDES Summer School, June 2-3, 2011, Tallinn, Estonia 10
11
Verification
“To err is human - and to blame it on a computer is even more so.”
Robert Orben
12
Verification versus test• The goal of verification is to check if a system is
designed correctly.
• Validation is similar to verification but we check on a prototype device, not a model.
• By (manufacturing) test we understand checking every instance of a produced chip against manufacruring defects.
13
Abstraction levels and verification
14
Difficulties in verification
• Errors may be in implementation, specification or verification environment (constraints)
• No way to detect bugs in the spec, because reference object is missing. Thus: verification by redundancy.
• Problem: How to assess verification quality i.e. coverage? (except in equivalence checking)
15
16
Verification flow
17
Dynamic verification
18
Dynamic verification
• Based on simulation
• Code coverage
• Assertions, functional coverage
19
Formal verification
20
Dynamic vs formal verification
21
Automated debug techniques
“Logic is a poor model of cause and effect.”
Gregory Bateson
22
• Concept of design error:– Mostly modeled in implementation,
sometimes in specification
• Main applications:– Checking the synthesis tools– Engineering change, incremental
synthesis– Debugging
Debugging design errors
What leads to debugging?
• Design behavior doesn’t match expected behavior
When does this occur?
• During simulation of design
• Formal tools (property/equivalence check)
• Checkers identify the mismatch
23
Debugging design errors
24
Design error diagnosis
• Classification of methods:– Structure-based/specification-based– Explicit/Implicit fault model (model-free)– Single/multiple error assumption– Simulation-based/symbolic
25
Debugging combinational logic
• Thoroughly studied in 1990s
• Many works by Aas, Abadir, Wahba & Borrione, others
• Also studied, at TUT (Ubar & Jutman)– Used structural BDDs for error
localization
26
Explicit error model (Abadir)
• functional errors of gate elements– gate substitution– extra gate– missing gate– extra inverter– missing inverter
• connection errors of signal lines– extra connection– missing connection– wrong connection
27
Missing gate error (Abadir)
28
Mapping stuck-at faults to design errors
• Abadir: Complete s-a test detects all single gate replacements (AND,OR,NAND,NOR), extra gates (simple case), missing gates (simple case) and extra wires.
Combinational fault diagnosis
F1 F2 F3 F4 F5 F6 F7
T1 0 1 1 0 0 0 0T2 1 0 0 1 0 0 0T3 1 1 0 1 0 1 0T4 0 1 0 0 1 0 0T5 0 0 1 0 1 1 0T6 0 0 1 0 0 1 1
Fault F5 located
Faults F1 and F4 are not distinguishable
Fault localization by fault table
E1 E2 E3
0 0 10 1 00 1 01 0 11 0 10 0 0
No match, diagnosis not possible
Test responses:
29
30
Mapping stuck-at faults to design errors
31
Distribution of design errors
32
Explicit model: disadvantages
• High number of errors to model
• Some errors still not modeled
33
Implicit design error models
• Do not rely on structure
• Circuit under verification as a black box
• I/O pin fault models
34
Design error correction
• Classification:
– Error matching approach
– Resynthesis approach
35
Design error correction
• Happens in a loop:– An error is detected and localized– Correction step is applied– Corrected design must be reverified– ...
• Until the design passes verification
36
Ambiguity of error location
• Since there is more than one way to synthesize a given function, it is possible that there is more than one way to model the error in an incorrect implementation
• correction can be made at different locations
Crash course on SAT
37CREDES Summer School, June 2-3, 2011, Tallinn, Estonia
Digitaalsüsteemide verifitseerimise kursus 38
Satisfiability aka SAT
• SAT: a Boolean function is satisfiable iff there exists a variable assignment to make it evaluate to TRUE
• The Boolean function must be represented as a CNF:
Digitaalsüsteemide verifitseerimise kursus 39
Satisfiability aka SAT
• SAT is transformed to CNF
(i.e. product of sums).
• Sums are called terms.
• If a term has max 2 literals, then 2-SAT
2-SAT is solved in polynomial time 3-SAT is an NP-complete problem
• N-SAT can be reduced to 3-SAT
Digitaalsüsteemide verifitseerimise kursus 40
SAT for circuits• Characteristic function
• Build CNF for logic gates using implication:
• ab = ¬a + b
a b ab
0 0 1
0 1 1
1 0 0
1 1 1
Digitaalsüsteemide verifitseerimise kursus 41
• Implications for AND-gate: ¬a¬c & ¬b ¬c & ¬c ¬a ¬b • Characteristic function for AND as a CNF: (a+ ¬c) (b+ ¬c) (c+ ¬a+ ¬b)
&a
bc
SAT for circuits
Digitaalsüsteemide verifitseerimise kursus 42
• Implications for OR-gate: ac & b c & c a b • Characteristic function for OR as a CNF:
(¬a + c) (¬b + c) (¬c + a + b)
1a
bc
SAT for circuits
Digitaalsüsteemide verifitseerimise kursus 43
Characteristic function for the circuit:
(a+¬d)(b+¬d)(d+¬a+¬b)(¬c+¬e)(c+e)(¬d+f)(¬e+f)(¬f+d+e)
1c e f
&a
bd
SAT for circuits
44
SAT-based RTL debug• Mux-enrichment
– Muxes added to RTL code blocks– Mux select values select free inputs for the
symptom blocks– Synthesis is applied to find logic expressions
generating the signatures for these free inputs
• Cardinality constraints
• Test vector constraintsSmith, Veneris, et al., TCAD, 2005
45
SAT-based RTL debug
a) Mux enrichment, b) cardinality constraints
46
SAT-based RTL debug
• SAT provides locations of signals where errors can be corrected
• Multiple errors considered!• They also provide the partial truth table of the fix• Correction by resynthesis• This is also a disadvantage:
– Why should we want to replace a bug with a more difficult one?
Path tracing for localization
• One of the first debug methods
• Backtracing mismatched outputs (sometimes also matched outputs)
• Dynamic slicing → critical path tracing (RTL)
47
Mutation-based correction
• Locate error suspects by backtracing
• Correct by mutating the faulty block (replace by a different function from a preset library)
• An error-matching approach
48
Testbench-based approach
49
1. Identify injection
location
1. Identify injection
location
2. Apply mutation operators
accordingly
2. Apply mutation operators
accordingly
Original system
description
Injected system
description
if (fn==1)
else if (fn==2)
...
if (fn==4)
else if (fn==5)
...
11
22
44
55
Arithmetic Operator Replacement (AOR)
• Set of arithmetic operators = {addition, subtraction, multiplication, division, modulo}
• Replace each occurrence of arithmetic operator with all the other operators in the set
a = b + c;
a = b – c;
a = b * c;
a = b / c;
a = b % c;
50
Logical Connector Replacement (LCR)
• Set of logical connectors = {and, nand, nor, or, xor}• Replace each occurrence of logical connector with all the
other connectors in the set
if (a & b) …
if !(a & b) …
if !(a | b) …
if (a | c) …
if (a ^ c) …
51
Relational Operator Replacement (ROR)
• Set of relational operators = {equal, not_equal, greater_than, less_than, greater_than_or_equal, less_or_equal_then}
• Replace each occurrence of relational operator with all the other operators in the set
if (a == b) …
if (a != b) …
if (a > b) …
if (a < b) …
if (a >= c) …
if (a <= c) …52
Unary Operator Injection (OUI)
• Set of unary operators = {negative, inversion}• Replace each occurrence of unary operator
with the other operator in the set
53
a = !b; a = ~b;
More mutation examples
• Constant value mutation
• Replacing signals with other signals
• Mutating control constructs
• .....
CREDES Summer School, June 2-3, 2011, Tallinn, Estonia 54
Approaches for SW & HW
• Vidroha Debroy and W. Eric Wong, Using Mutation to Automatically Suggest Fixes for Faulty Programs, Software Testing, Verification and Validation Conf., June 2010.
• Raik, J.; Repinski, U.; et al. High-level design error diagnosis using backtrace on decision diagrams. 28th Norchip Conference 15-16 November 2010.
55
Motivational example
56
IF res = 1 THEN state:=s0; ELSE CASE state IS WHEN s0 => a:=in1; b:=in2; ready:=0; state:=s1; WHEN s1 => IF ab THEN state:=s2; ELSE state:=s5; ENDIF; WHEN s2 => IF a>b THEN state:=s3; ELSE state:=s4; ENDIF; WHEN s3 => a:=a-b; state:=s1; WHEN s4 => b:=b-a; state:=s1; WHEN s5 => ready:=1; state:=s5; END CASE; END IF;
a) b)
T
res
state
a≠b
a>b
s0
s1
s2
s3 s4
s5
state s0,s3,s4 0
1 s1
s5
F T
s2
F
s1,s2,s3,s44
s0 state
ready
0
1
ready
s5
s45
s0 state
b-a
in2
b
b
s1,s2,s3,s54
s35
s0 state
a-b
in1
a
a
s1,s2,s4,s54
a-b
b:=a-b
Motivational example
57
r e s in 1 in 2 s ta te a b r e a d y 1 4 2 - - - - 0 - - s 0 4 2 0 0 - - s 1 4 2 0 0 - - s 2 4 2 0 0 - - s 3 4 2 0 0 - - s 1 2 2 0 0 - - s 5 2 2 0 0 - - s 5 2 2 1
r e s in 1 in 2 s ta te a b r e a d y 1 2 4 - - - - 0 - - s 0 2 4 0 0 - - s 1 2 4 0 0 - - s 2 2 4 0 0 - - s 4 2 4 0 0 - - s 1 2 2 -2 0 0 - - s 5 s 2 2 2 -2 0 0 - - s 5 s 3 2 2 -2 1 0
Passed sequence Failed sequence
Motivational example
58
ready
b
ready:=1 ready:=0
res=1
state:=s1state:=s5 state:=s3 state:=s2 state:=s0state:=s1a=b a≠ba>b
a=ab a:=in1
b:=in2
ready
b
ready:=0
res=1
state:=s1state:=s2 state:=s4 state:=s2 state:=s0state:=s1a≠b a≠bab
a:=in1
b:=in2b:=ab
Backtrace cone: Passed sequence
Backtrace cone: Failed sequence
Statistical analysis
• Ranking according to suspiciousness:
59
Suspiciousness score
Circuit blocks
Fault localization experiments
60
Design success rate, # detected functions
average resolution, # suspects
worst resolution, # suspects
step1 step2 step1 step2 step1 step2 gcd 2/2 2/2 3 1 3 1 diffeq 8/8 8/8 3.3 1.9 5.6 2.8 risc 16/16 13/16 7.6 1.4 11.6 2.3 crc 25/25 20/25 17.3 2.4 21 7
Step1: Critical path tracing of mismatched outputs (max Failed)
Step2: Max ratio (Failed/Passed+Failed) of backtrace cones
Advantages & open questions
• Mutation-based repair is readable
• Helps keeping user in the loop
• Provides a „global“ repair, for all stimuli
• How does this backtracing based method perform in the case of multiple errors?
• What would be a good fault model for high-level design errors?
61
Future trends• The quality of localization and correction is
dependent on input stimuli
• Thus, diagnostic test generation needed
• Readable, small correction prefered:– Correction holds normally only wrt given input
vectors (e.g. Resynthesis)– Why should we replace an easily detectable
bug with a more difficult one?!
62
Idea: HLDD-based correction
• A canonical form of high-level decision diagrams (HLDD) using characteristic polynomials
• It allows fast probabilistic proof of equivalence of two different designs.
• Idea: Extend it towards correction
63
Prototype tools, activities
CREDES Summer School, June 2-3, 2011, Tallinn, Estonia 64
DIAMOND Kick-off, Tallinn, February 2-3, 2010
65
FP7 Project DIAMOND
• Start January 2010, duration 3 years
• Total budget 3.8M € – EU contribution 2.9M €
• Effort 462.5 PM
The IBM logo is a registered trademark of International Business Machines Corporation (IBM) in the United States and other countries.
66
The DIAMOND concept
Specification Implementation Post-SiliconDesign Flow
Design errors, soft errors, ...
Holistic fault models Fault diagnosis Fault
correction
Reliable Nanoelectronics Systems
67
FORENSIC
• FoREnSiC – Formal Repair Engine for Simple C
• For debugging system-level HW
• Idea by TUG, UNIB and TUT at DATE’10
• Front-end converting simple C descriptions to flowchart model completed
• 1st release expected by the end of 2011
68
Forensic Flow
69
APRICOT: Design Verification
Extensions of BDD HLDD THLDD
APriCoT Verification System– Assertion/Property checkIng, Code coverage
& Test generation– The tools run on a uniform design model
based on high-level decision diagrams. – The functionality includes currently
• test generation, • code coverage analysis, • assertion-checking, • mutation analysis and • design error localization
70
ZamiaCAD: IDE for HW Design• ZamiaCAD is an Eclipse-based development
environment for hardware designs
• Design entry
• Analysis
• Navigation
• Simulation
• Scalable!
• Co-operation with IBM Germany, R. Dorsch
71
72
To probe further...
Functional Design Errors in Digital Circuits: Diagnosis, Correction and Repair
K. H. Chang, I. L. Markov, V. Bertacco...............................................
Publisher: Springer
Pub Date: 2009
Top Related