Download - Techniques for automated localization and correction of design errors

1

Techniques for automated localization and correction of

design errors

Jaan RaikTallinn University of Technology

2

Design error debug

“There has never been an unexpectedly short debugging period in the history of computers.”

Steven Levy

3

Designs are getting bigger

JAAN RAIK

JAAN RAIK

One cannot start without refering to the Moore's Law

4

25-30 % annually decreasing cost per function

15 percent annual growth of the market for IC

• But …

The cost of chip design keeps on growing.

• In 1981, development of a leading-edge CPU cost 1 M$

• …today it costs more than 300 M$ !!!

• Why do the costs increase ???

Designs are getting costlier

5

Design automation crisis

• productivity gap– 58% versus 21% annually

transistorson the die

Tehnology’scapabilities

Designer’s productivity

todaytime

40 60 70

30 40 2

30 2

3 2

System design

Logic design

Physicaldesign

Simulation Schematic entry

Placement &routing

Hierarchy,generators

Logic synthesis

High-level synthesis /System-level synthesis

Specialized high-level synthesis

< 1979

~ 1983

1986

1988-92

1992-95

~1996-...

Person months /20 000 logic gates

10 2

30 50 2

6

Verification and debugging

• Debug = Localization + Correction• ~2/3 of development time for verification• ~2/3 of verification time for debug• Thus nearly half of the development cycle

Specify Design Detect Localise Correct

VerificationDebugDevelopment time:

JAAN RAIK

Effort required by verification is no news, debug is the major issue

Bugs are getting „smarter“

7CREDES Summer School, June 2-3, 2011, Tallinn, Estonia

Traditional debug flow

8

Verification

Design

Spec

Error!

Counter-examples (waveforms), failed assertions, ...

???

• Too much information• Too little information

JAAN RAIK

Too much information, too little information

Automated debug flow

9

Verification

Design

Spec

Error!

Corrected design, Repair log, ...

Error localization

Error correction

JAAN RAIK

Too much information, too little information

Outline

• Verification basics

• Automated debug at the gate-level

• RTL debug methods– Localization: SAT; correction: resynthesis– Localization: path tracing; correction: mutation

• General discussion, future trends

• Prototype tools, on-going activities

CREDES Summer School, June 2-3, 2011, Tallinn, Estonia 10

11

Verification

“To err is human - and to blame it on a computer is even more so.”

Robert Orben

12

Verification versus test• The goal of verification is to check if a system is

designed correctly.

• Validation is similar to verification but we check on a prototype device, not a model.

• By (manufacturing) test we understand checking every instance of a produced chip against manufacruring defects.

13

Abstraction levels and verification

14

Difficulties in verification

• Errors may be in implementation, specification or verification environment (constraints)

• No way to detect bugs in the spec, because reference object is missing. Thus: verification by redundancy.

• Problem: How to assess verification quality i.e. coverage? (except in equivalence checking)

16

Verification flow

17

Dynamic verification

18

Dynamic verification

• Based on simulation

• Code coverage

• Assertions, functional coverage

19

Formal verification

20

Dynamic vs formal verification

21

Automated debug techniques

“Logic is a poor model of cause and effect.”

Gregory Bateson

22

• Concept of design error:– Mostly modeled in implementation,

sometimes in specification

• Main applications:– Checking the synthesis tools– Engineering change, incremental

synthesis– Debugging

Debugging design errors

What leads to debugging?

• Design behavior doesn’t match expected behavior

When does this occur?

• During simulation of design

• Formal tools (property/equivalence check)

• Checkers identify the mismatch

23

Debugging design errors

24

Design error diagnosis

• Classification of methods:– Structure-based/specification-based– Explicit/Implicit fault model (model-free)– Single/multiple error assumption– Simulation-based/symbolic

25

Debugging combinational logic

• Thoroughly studied in 1990s

• Many works by Aas, Abadir, Wahba & Borrione, others

• Also studied, at TUT (Ubar & Jutman)– Used structural BDDs for error

localization

26

Explicit error model (Abadir)

• functional errors of gate elements– gate substitution– extra gate– missing gate– extra inverter– missing inverter

• connection errors of signal lines– extra connection– missing connection– wrong connection

27

Missing gate error (Abadir)

28

Mapping stuck-at faults to design errors

• Abadir: Complete s-a test detects all single gate replacements (AND,OR,NAND,NOR), extra gates (simple case), missing gates (simple case) and extra wires.

Combinational fault diagnosis

F1 F2 F3 F4 F5 F6 F7

T1 0 1 1 0 0 0 0T2 1 0 0 1 0 0 0T3 1 1 0 1 0 1 0T4 0 1 0 0 1 0 0T5 0 0 1 0 1 1 0T6 0 0 1 0 0 1 1

Fault F5 located

Faults F1 and F4 are not distinguishable

Fault localization by fault table

E1 E2 E3

0 0 10 1 00 1 01 0 11 0 10 0 0

No match, diagnosis not possible

Test responses:

29

30

Mapping stuck-at faults to design errors

31

Distribution of design errors

32

Explicit model: disadvantages

• High number of errors to model

• Some errors still not modeled

33

Implicit design error models

• Do not rely on structure

• Circuit under verification as a black box

• I/O pin fault models

34

Design error correction

• Classification:

– Error matching approach

– Resynthesis approach

35

Design error correction

• Happens in a loop:– An error is detected and localized– Correction step is applied– Corrected design must be reverified– ...

• Until the design passes verification

36

Ambiguity of error location

• Since there is more than one way to synthesize a given function, it is possible that there is more than one way to model the error in an incorrect implementation

• correction can be made at different locations

Crash course on SAT

37CREDES Summer School, June 2-3, 2011, Tallinn, Estonia

Digitaalsüsteemide verifitseerimise kursus 38

Satisfiability aka SAT

• SAT: a Boolean function is satisfiable iff there exists a variable assignment to make it evaluate to TRUE

• The Boolean function must be represented as a CNF:


Satisfiability aka SAT

• SAT is transformed to CNF

(i.e. product of sums).

• Sums are called terms.

• If a term has max 2 literals, then 2-SAT

2-SAT is solved in polynomial time 3-SAT is an NP-complete problem

• N-SAT can be reduced to 3-SAT


SAT for circuits• Characteristic function

• Build CNF for logic gates using implication:

• ab = ¬a + b

a b ab

0 0 1

0 1 1

1 0 0

1 1 1


• Implications for AND-gate: ¬a¬c & ¬b ¬c & ¬c ¬a ¬b • Characteristic function for AND as a CNF: (a+ ¬c) (b+ ¬c) (c+ ¬a+ ¬b)

&a

bc

SAT for circuits


• Implications for OR-gate: ac & b c & c a b • Characteristic function for OR as a CNF:

(¬a + c) (¬b + c) (¬c + a + b)

1a

bc

SAT for circuits


Characteristic function for the circuit:

(a+¬d)(b+¬d)(d+¬a+¬b)(¬c+¬e)(c+e)(¬d+f)(¬e+f)(¬f+d+e)

1c e f

&a

bd

SAT for circuits

44

SAT-based RTL debug• Mux-enrichment

– Muxes added to RTL code blocks– Mux select values select free inputs for the

symptom blocks– Synthesis is applied to find logic expressions

generating the signatures for these free inputs

• Cardinality constraints

• Test vector constraintsSmith, Veneris, et al., TCAD, 2005

45

SAT-based RTL debug

a) Mux enrichment, b) cardinality constraints

46

SAT-based RTL debug

• SAT provides locations of signals where errors can be corrected

• Multiple errors considered!• They also provide the partial truth table of the fix• Correction by resynthesis• This is also a disadvantage:

– Why should we want to replace a bug with a more difficult one?

Path tracing for localization

• One of the first debug methods

• Backtracing mismatched outputs (sometimes also matched outputs)

• Dynamic slicing → critical path tracing (RTL)

47

Mutation-based correction

• Locate error suspects by backtracing

• Correct by mutating the faulty block (replace by a different function from a preset library)

• An error-matching approach

48

Testbench-based approach

49

1. Identify injection

location

1. Identify injection

location

2. Apply mutation operators

accordingly

2. Apply mutation operators

accordingly

Original system

description

Injected system

description

if (fn==1)

else if (fn==2)

...

if (fn==4)

else if (fn==5)

...

11

22

44

55

Arithmetic Operator Replacement (AOR)

• Set of arithmetic operators = {addition, subtraction, multiplication, division, modulo}

• Replace each occurrence of arithmetic operator with all the other operators in the set

a = b + c;

a = b – c;

a = b * c;

a = b / c;

a = b % c;

50

Logical Connector Replacement (LCR)

• Set of logical connectors = {and, nand, nor, or, xor}• Replace each occurrence of logical connector with all the

other connectors in the set

if (a & b) …

if !(a & b) …

if !(a | b) …

if (a | c) …

if (a ^ c) …

51

Relational Operator Replacement (ROR)

• Set of relational operators = {equal, not_equal, greater_than, less_than, greater_than_or_equal, less_or_equal_then}

• Replace each occurrence of relational operator with all the other operators in the set

if (a == b) …

if (a != b) …

if (a > b) …

if (a < b) …

if (a >= c) …

if (a <= c) …52

Unary Operator Injection (OUI)

• Set of unary operators = {negative, inversion}• Replace each occurrence of unary operator

with the other operator in the set

53

a = !b; a = ~b;

More mutation examples

• Constant value mutation

• Replacing signals with other signals

• Mutating control constructs

• .....


Approaches for SW & HW

• Vidroha Debroy and W. Eric Wong, Using Mutation to Automatically Suggest Fixes for Faulty Programs, Software Testing, Verification and Validation Conf., June 2010.

• Raik, J.; Repinski, U.; et al. High-level design error diagnosis using backtrace on decision diagrams. 28th Norchip Conference 15-16 November 2010.

55

Motivational example

56

IF res = 1 THEN state:=s0; ELSE CASE state IS WHEN s0 => a:=in1; b:=in2; ready:=0; state:=s1; WHEN s1 => IF ab THEN state:=s2; ELSE state:=s5; ENDIF; WHEN s2 => IF a>b THEN state:=s3; ELSE state:=s4; ENDIF; WHEN s3 => a:=a-b; state:=s1; WHEN s4 => b:=b-a; state:=s1; WHEN s5 => ready:=1; state:=s5; END CASE; END IF;

a) b)

T

res

state

a≠b

a>b

s0

s1

s2

s3 s4

s5

state s0,s3,s4 0

1 s1

s5

F T

s2

F

s1,s2,s3,s44

s0 state

ready

0

1

ready

s5

s45

s0 state

b-a

in2

b

b

s1,s2,s3,s54

s35

s0 state

a-b

in1

a

a

s1,s2,s4,s54

a-b

b:=a-b


57

r e s in 1 in 2 s ta te a b r e a d y 1 4 2 - - - - 0 - - s 0 4 2 0 0 - - s 1 4 2 0 0 - - s 2 4 2 0 0 - - s 3 4 2 0 0 - - s 1 2 2 0 0 - - s 5 2 2 0 0 - - s 5 2 2 1

r e s in 1 in 2 s ta te a b r e a d y 1 2 4 - - - - 0 - - s 0 2 4 0 0 - - s 1 2 4 0 0 - - s 2 2 4 0 0 - - s 4 2 4 0 0 - - s 1 2 2 -2 0 0 - - s 5 s 2 2 2 -2 0 0 - - s 5 s 3 2 2 -2 1 0

Passed sequence Failed sequence


58

ready

b

ready:=1 ready:=0

res=1

state:=s1state:=s5 state:=s3 state:=s2 state:=s0state:=s1a=b a≠ba>b

a=ab a:=in1

b:=in2

ready

b

ready:=0

res=1

state:=s1state:=s2 state:=s4 state:=s2 state:=s0state:=s1a≠b a≠bab

a:=in1

b:=in2b:=ab

Backtrace cone: Passed sequence

Backtrace cone: Failed sequence

Statistical analysis

• Ranking according to suspiciousness:

59

Suspiciousness score

Circuit blocks

Fault localization experiments

60

Design success rate, # detected functions

average resolution, # suspects

worst resolution, # suspects

step1 step2 step1 step2 step1 step2 gcd 2/2 2/2 3 1 3 1 diffeq 8/8 8/8 3.3 1.9 5.6 2.8 risc 16/16 13/16 7.6 1.4 11.6 2.3 crc 25/25 20/25 17.3 2.4 21 7

Step1: Critical path tracing of mismatched outputs (max Failed)

Step2: Max ratio (Failed/Passed+Failed) of backtrace cones

Advantages & open questions

• Mutation-based repair is readable

• Helps keeping user in the loop

• Provides a „global“ repair, for all stimuli

• How does this backtracing based method perform in the case of multiple errors?

• What would be a good fault model for high-level design errors?

61

Future trends• The quality of localization and correction is

dependent on input stimuli

• Thus, diagnostic test generation needed

• Readable, small correction prefered:– Correction holds normally only wrt given input

vectors (e.g. Resynthesis)– Why should we replace an easily detectable

bug with a more difficult one?!

62

Idea: HLDD-based correction

• A canonical form of high-level decision diagrams (HLDD) using characteristic polynomials

• It allows fast probabilistic proof of equivalence of two different designs.

• Idea: Extend it towards correction

63

Prototype tools, activities


DIAMOND Kick-off, Tallinn, February 2-3, 2010

65

FP7 Project DIAMOND

• Start January 2010, duration 3 years

• Total budget 3.8M € – EU contribution 2.9M €

• Effort 462.5 PM

The IBM logo is a registered trademark of International Business Machines Corporation (IBM) in the United States and other countries.

66

The DIAMOND concept

Specification Implementation Post-SiliconDesign Flow

Design errors, soft errors, ...

Holistic fault models Fault diagnosis Fault

correction

Reliable Nanoelectronics Systems

67

FORENSIC

• FoREnSiC – Formal Repair Engine for Simple C

• For debugging system-level HW

• Idea by TUG, UNIB and TUT at DATE’10

• Front-end converting simple C descriptions to flowchart model completed

• 1st release expected by the end of 2011

68

Forensic Flow

69

APRICOT: Design Verification

Extensions of BDD HLDD THLDD

APriCoT Verification System– Assertion/Property checkIng, Code coverage

& Test generation– The tools run on a uniform design model

based on high-level decision diagrams. – The functionality includes currently

• test generation, • code coverage analysis, • assertion-checking, • mutation analysis and • design error localization

70

ZamiaCAD: IDE for HW Design• ZamiaCAD is an Eclipse-based development

environment for hardware designs

• Design entry

• Analysis

• Navigation

• Simulation

• Scalable!

• Co-operation with IBM Germany, R. Dorsch

71

http://zamiacad.sf.net/

72

To probe further...

Functional Design Errors in Digital Circuits: Diagnosis, Correction and Repair

K. H. Chang, I. L. Markov, V. Bertacco...............................................

Publisher: Springer

Pub Date: 2009