Fault Diagnosis Overview

54
Fault Diagnosis Fault Diagnosis Overview Overview David Lavo UC Santa Cruz January 13, 2005

Transcript of Fault Diagnosis Overview

Page 1: Fault Diagnosis Overview

Fault Diagnosis OverviewFault Diagnosis Overview

David LavoUC Santa Cruz

January 13, 2005

Page 2: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 2

OutlineOutline

• Introduction: What is Fault Diagnosis?• Components: What’s involved?• Algorithm details: How does it work?• Diagnosis in practice: How does it really

work?• Research: Why does (or doesn’t) it work?

How should it work?

Page 3: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 3

What is Fault Diagnosis?What is Fault Diagnosis?

• A guess as to what’s wrong with a malfunctioning circuit

• Narrows the search for physical root cause• Makes inferences based on observed

behavior• Usually based on the logical operation of the

circuit

Page 4: Fault Diagnosis Overview

VLSI Fault Diagnosis VLSI Fault Diagnosis (in One Slide)(in One Slide)

Tests ObservedBehavior

Defective Circuit

Diagnosis Diagnosis AlgorithmPhysical Analysis

Location or

Fault

Page 5: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 5

Two Types of DiagnosisTwo Types of Diagnosis

• Circuit Partitioning (“Effect-Cause” Diagnosis)– Identify fault-free or possibly-faulty portions– Identify suspect components, logic blocks,

interconnects• Model-Based Diagnosis (“Cause-Effect”

Diagnosis)– Assume one or more specific fault models– Compare behavior to fault simulations

Page 6: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 6

Circuit PartitioningCircuit Partitioning

• Separate known-good portions of circuit from likely areas of failure

• Simplest method: identify failing flip-flops– Tester can identify failing flops or outputs– Input cone of logic is suspect– Intersection of multiple cones is highly

suspect– Single clock pulse with scan can be used

for sequential/functional fails

Page 7: Fault Diagnosis Overview

Back-Tracing FailuresBack-Tracing Failures

Page 8: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 8

aka Effect-Cause Diagnosisaka Effect-Cause Diagnosis

• Reasoning based on observed behavior and expected (good-circuit) functions

• Commonly used at system and board-levels• Tries to separate good and suspect areas• Advantage: Simple and general• Disadvantage: Not very precise, often gives

no indication of defect mechanism

Page 9: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 9

Cause-Effect DiagnosisCause-Effect Diagnosis

• Start from possible causes (fault models), compare to observed effects

• A simulator is used to predict behavior of the circuit in the presence of various faults

• Match prediction(s) against observed behavior• Advantage: Implicates a mechanism as well as a

location• Disadvantage: Can be fooled by unmodeled

defects

Page 10: Fault Diagnosis Overview

Tests

Defective Circuit

Fault Simulator

010001010100010101010 …

Behavior Signature

010100110000101010100 …

101000100001011101100 …

010100010100011101100 …

000111000101010011110 …

Candidate Signatures

Diagnosis Algorithm

Comparison & Conclusion

Cause-Effect DiagnosisCause-Effect Diagnosis

Page 11: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 11

OutlineOutline

• Introduction: What is Fault Diagnosis?• Components: What’s involved?• Algorithm details: How does it work?• Diagnosis in practice: How does it really

work?• Research: Why does (or doesn’t) it work?

How should it work?

Page 12: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 12

Components of Fault DiagnosisComponents of Fault Diagnosis

• Fault models• Fault simulators• Fault dictionaries• Diagnosis algorithms

Page 13: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 13

Fault ModelsFault Models

• A fault model is an abstraction of a type of defect behavior

• A fault instance is the application of a model to a circuit wire, node, gate, etc.

• Used to create and evaluate test sets• For diagnosis, they can be used to simulate

and predict faulty behaviors

Page 14: Fault Diagnosis Overview

• The most-used fault model (by far)

• Simple to simulate and enumerate

• Effective for testing, fault grading, and diagnosis of some defects

• Many defects are not well represented by the stuck-at model

0/10/1

1

Node A stuck-at 1:

(Fault-free/faulty logic values)

A

B

Stuck-at Fault ModelStuck-at Fault Model

Page 15: Fault Diagnosis Overview

• Shorts are a common defect type in CMOS

• Different bridging fault models have varying accuracy and precision, from simplistic to very sophisticated

• Difficult or impractical to enumerate

Bridging Fault ModelBridging Fault Model

0

1

1

1

0

1/0

X

Y

Nodes X and Y bridged:

Node X forces Y to a value of 0

Page 16: Fault Diagnosis Overview

Some Diagnostic Fault ModelsSome Diagnostic Fault Models

Gate FaultNet Fault

Bridging Fault Path Fault

Page 17: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 17

Fault SimulatorsFault Simulators

• A fault simulator can simulate instances of a particular fault model

• Inputs:– Circuit (netlist)– Test set– Faultlist (list of fault instances)

• Output: circuit response• Usually, simulates the presence of a single

fault instance (“single-fault assumption”)

Page 18: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 18

Fault DictionariesFault Dictionaries

• A fault dictionary is a database of the simulated responses for all faults in faultlist

• Used by some diagnosis algorithms for convenience:– Fast: no simulation at time of diagnosis– Self-contained: netlist, simulator, and test

set not needed after dictionary creation• Can be very large, however!

Page 19: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 19

The Full-Response DictionaryThe Full-Response Dictionary

• For each fault ( f ), store the response to each test vector ( v )

• One bit per vector, pass ( 0 ) or fail ( 1 )• For each vector, store the expected output

response ( o )• Total storage requirement: f v o bits

Page 20: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 20

The Pass-Fail DictionaryThe Pass-Fail Dictionary

• For each fault, store only the test vector responses

• One bit per vector, pass ( 0 ) or fail ( 1 )• Total storage requirement: f v bits • Much smaller than full-response, and often

practical for even very large circuits

Page 21: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 21

Dynamic DiagnosisDynamic Diagnosis

• Alternative to dictionary-based diagnosis• Fault simulation is only done for certain faults,

based on test results– Only simulate faults in input cones of failing

flip-flops/outputs• Dictionary is eliminated, but requires complete

netlist and test pattern file• Used by most commercial ATPG tools: Mentor

Fastscan, Synopsys, Cadence, etc.

Page 22: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 22

OutlineOutline

• Introduction: What is Fault Diagnosis?• Components: What’s involved?• Algorithm details: How does it work?• Diagnosis in practice: How does it really

work?• Research: Why does (or doesn’t) it work?

How should it work?

Page 23: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 23

Algorithm DetailsAlgorithm Details

• Role of a diagnosis algorithm• Scoring methods• Types of diagnosis algorithms

Page 24: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 24

Diagnosis AlgorithmsDiagnosis Algorithms

• Algorithms compare observed behavior to predicted behaviors

• An algorithm attempts to “explain” the observed failures with fault candidates

• The job of a diagnosis algorithm is to report the best fault candidate(s)

• “Best” is determined by scoring method

Page 25: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 25

Fault Candidate ScoringFault Candidate Scoring

• Two common scoring methods– Match/mismatch points– Fault candidate probability

• Other common scorings:– Hamming distance– Set intersection/overlap– Nearest neighbor

Page 26: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 26

Match/mismatch Point ScoringMatch/mismatch Point Scoring

• Award points for matching observed failures• Optionally deduct points for not predicting fails• Nonprediction: A behavior not predicted by

candidate• Misprediction: A prediction not fulfilled by

behavior• Commercial tools (e.g. Fastscan) are usually

biased to lowest nonprediction

Page 27: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 27

Probabilistic ScoringProbabilistic Scoring• Probability score based on matches and

mismatches and error assumptions– Weights for non- and mis-prediction– Different prediction probabilities for different

fault candidates (bridges vs. stuck-at)• Usually normalized so that total of all

candidates equals 1.0• UCSC method uses probabilities to compare

stuck-at candidates to bridges in same diagnosis

Page 28: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 28

Types of Diagnosis AlgorithmsTypes of Diagnosis Algorithms• Stuck-at

– Most common, best supported by tools– Surprisingly effective (~60% exact matches)– Very fast

• IDDQ

– Orthogonal set of failing data– Requires interpretation of tester results– Not well supported by tools

Page 29: Fault Diagnosis Overview

IIDDQDDQ Threshold Setting Threshold Setting

020406080

100120140160180

0 50 100 150 200

Page 30: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 30

Types of Diagnosis Algorithms Types of Diagnosis Algorithms (Cont)(Cont)

• Bridging-fault– May better represent common CMOS faults– More complicated fault model– Biggest problem: candidate selection

• Other possible (future) directions:– Functional fails– Delay fails– Parametric failures

Page 31: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 31

OutlineOutline

• Introduction: What is Fault Diagnosis?• Components: What’s involved?• Algorithm details: How does it work?• Diagnosis in practice: How does it really

work?• Research: Why does (or doesn’t) it work?

How should it work?

Page 32: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 32

Diagnosis in PracticeDiagnosis in Practice

• Using a diagnosis• Translating the results: circuit navigation• Evaluating diagnosis quality• Commercial diagnosis tools

Page 33: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 33

Using a DiagnosisUsing a Diagnosis

• Fault diagnosis is used to aid physical inspection and root-cause identification

• Diagnosis output is logical, not physical:– Abstract faults (such as stuck-at)– Gates, ports (nodes), and nets– No information about location or size

• Translation to physical location requires navigation of circuit

Page 34: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 34

Types of Circuit NavigationTypes of Circuit Navigation

• Netlist– Examine RTL (Verilog/VHDL etc) for gates

and data paths• Schematic

– Symbolic view of gates and wires• Layout/artwork

– Graphical view of metal lines, poly, vias, cell boundaries, etc.

Page 35: Fault Diagnosis Overview

Circuit NetlistCircuit Netlistmodule TOP (CLK, Reset, StartOut, SiReady, Rst_CntN, Up_DnN, Wr, SDin, Wr_RAM, Wr_Rreg, RAM_Addr, ATG_TESTMODE, BIST_TESTMODE, SDout, TwoOnes, OneOne, NoOnes, TwoZeros, OneZero, NoZeros);

input CLK;inout Reset, StartOut, SiReady, Rst_CntN, Up_DnN, Wr, SDin, Wr_RAM;

inout [2:0] RAM_Addr;inout ATG_TESTMODE;inout BIST_TESTMODE;inout SDout, OneZero, NoZeros;inout TwoOnes, OneOne, NoOnes, TwoZeros, Wr_Rreg;

// Tie off cellsTLOW tielow1 (.Q(tielow));THIGH tiehigh1 (.Q(tiehigh));

// Inverted CLKwire CLK_N;INVFF clkinv (.Q(CLK_N), .A(CLK));

//PADS

PADNMIOSCM0H08N05B50 PAD001_StartOut (.PUEN(tiehigh), .PDE(tielow), .IEN(tielow), .I(StartOut_I), .SIGNAME(StartOut), .INMODE(in_mode_avail), .TESTI(jumper001), .TESTIEN(tiehigh), .SCANIN(jumper001), .OUTMODE(out_mode_avail), .TESTO(tiehigh), .TESTOEN(tiehigh), .O(tielow), .OEN(tiehigh));

Page 36: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 36

Netlist NavigationNetlist Navigation

• Either use text editor on netlist, or use browser function in simulator

• Browsers allow you to trace forward and backward and see logic values

• Can be used to view hierarchy and functional blocks

• Can be tedious

Page 37: Fault Diagnosis Overview

Circuit SchematicCircuit Schematic

Page 38: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 38

Schematic NavigationSchematic Navigation

• Either hand-drawn (from netlist navigation) or tool-generated gate symbols and wires

• Schematic tools in simulators also allow forward and backward traversal and display of logic values

• Used to verify fault propagation• Does not reflect physical distances

Page 39: Fault Diagnosis Overview

Circuit ArtworkCircuit Artwork

Page 40: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 40

Layout (Artwork) NavigationLayout (Artwork) Navigation

• Use routing/floorplanning tools to view artwork• Can usually input cell or wire name and tool will

highlight the object• Useful for determining (x,y) values• Also good for evaluating physical implications of

a set of fault candidates– Faults clustered in a small area are good– Faults/nets spread around large die areas are

bad

Page 41: Fault Diagnosis Overview

Fault ProximityFault Proximity

Faults contained in small area: physical examination is possible

Net runs across die: physical examination is almost impossible

Page 42: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 42

Evaluating a DiagnosisEvaluating a Diagnosis

• A diagnosis without one or a few strong (high-scoring) candidates is usually poor

• Can indicate:– Multiple defects– Unmodeled (complex) behavior– Inappropriate algorithm

• If the diagnosis is poor, either try another algorithm or look for more data (failures)

Page 43: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 43

Evaluating a Diagnosis (cont)Evaluating a Diagnosis (cont)

• Many diagnoses (~60%) implicate a single stuck-at fault

• Usually a good sign, but you must consider equivalent faults

• Many defects can mimic a stuck-at fault, without being a short to Vdd or Gnd

• Consider nearby nodes also, if practical

Page 44: Fault Diagnosis Overview

Dominance Bridging FaultDominance Bridging Fault

FIB short

Strong inverter

Weak inverterTop candidate is stuck-at fault

on this node.

Page 45: Fault Diagnosis Overview

Candidate #2 is BestCandidate #2 is Best

FIB short

Candidate #1 Candidate #2Candidate #3

Page 46: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 46

Commercial Tool:Commercial Tool:Mentor GraphicsMentor Graphics

• ATPG tool: Fastscan• Stuck-at diagnosis only• No IDDQ capability• Orders candidates by number of matched

failures (biased to lowest non-prediction)• Also has netlist & schematic browser• Based on Waicukauski & Lindbloom (D&T‘89)

Page 47: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 47

Commercial Tool: SynopsysCommercial Tool: Synopsys

• ATPG tool: TetraMAX• J. Waicukauski moved to Synopsys after

writing Fastscan• Diagnosis capability unknown: assumed to be

similar to Fastscan

Page 48: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 48

Commercial Tool: CadenceCommercial Tool: Cadence• ATGP tool: Encounter Test• Test and diagnosis tools purchased from IBM• IBM has had good diagnosis research, but

Encounter’s capabilities are unknown• Also of interest: Silicon Ensemble - routing tool• Graphical artwork viewer• Good for highlighting nets and cells based on

diagnosis results• Good for determining (x,y) and producing screen

shots

Page 49: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 49

OutlineOutline

• Introduction: What is Fault Diagnosis?• Components: What’s involved?• Algorithm details: How does it work?• Diagnosis in practice: How does it really

work?• Research: Why does (or doesn’t) it work?

How should it work?

Page 50: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 50

Prior ArtPrior Art• Waicukauski & Lindbloom, IEEE Design & Test, Aug. ‘89

– Most widely-used algorithm for commercial tools– Finds candidates to match individual tests, attempts to “explain”

all failing tests• Abramovici & Breuer, IEEE Trans. Computing, June ‘80

– Effect-cause diagnosis– Permanent stuck-at fault assumption

• Aitken & Maxwell, HP Journal, Feb. ’95– Analysis of relative importance of models vs. algorithms

• Lavo, Larrabee, et. Al., Proceedings of ITC ’98– Probabilistic scoring– Mixed-model diagnosis

• Bartenstein et. Al., Proceedings of ITC ’01– SLAT: Single Location At-a-Time diagnosis– Focus on matching per-vector results

Page 51: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 51

Prior Art (cont)Prior Art (cont)

• Jee & Ferguson, Proceedings of ISTFA ’93– Carafe – Inductive Fault Analysis (IFA)– Examine circuit to determine likely failure locations

• Aitken, Proceedings of ITC ’95– Using FIBs to insert defects– Calibrate/evaluate diagnosis methods

• Henderson & Soden, Proceedings of ITC ’97– Probabilistic physical failure analysis

• Nigh, Vallett, et. Al., Proceedings of ITC ’98– Large-scale, multi-company SEMATECH experiment– Failure analysis of timing and IDDQ fails

Page 52: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 52

Research DirectionsResearch Directions

• Complex defect behaviors– Beyond stuck-at and 2-line bridges– Intermittent faults– Delay and timing-related defects– Parametric & process-related defects– Multiple simultaneous defects– Is there a simple, inductive way to infer

complex defects?

Page 53: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 53

Research Directions (cont)Research Directions (cont)

• Diagnosibility– What makes a particular circuit easy or

hard to diagnose?– What can we do to make diagnosis easier?

• Evaluation of diagnoses– What makes a good diagnosis?– Can we quantify our confidence in a

diagnosis?

Page 54: Fault Diagnosis Overview

©2005 David Lavo Fault Diagnosis Overview 54

Research Directions (cont)Research Directions (cont)• Integration with physical FA & yield improvement

– Can we incorporate process information?– Can we produce a “physical diagnosis”?– On-line (or even on-chip) diagnosis

• Commercial toolflow integration– Can diagnosis tools use industry-standard data

formats?– Can commercial tools be scripted or

programmed to do better diagnosis?