BIST-Based Fault Diagnosis Presence Embedded...

VLSI DESIGN2001, Vol. 12, No. 4, pp. 487-500Reprints available directly from the publisherPhotocopying permitted by license only

(C) 2001 OPA (Overseas Publishers Association)N.V.Published by license under

the Gordon and Breach Science Publishers imprint,member of the Taylor & Francis Group.

BIST-Based Fault Diagnosis in the Presenceof Embedded Memories*

JACOB SAVIR

ECE Dept., New Jersey Institute of Technology, University Heights, Newark, New Jersey 07102-1982

(Received 15 August 1999; In finalform 11 September 2000)

An efficient method is described for using fault simulation as a solution to the diagnosticproblem created by the presence of embedded memories in BIST designs. Thesimulation is event-table-driven. Special techniques are described to cope with the faultsin the Prelogic, Postlogic, and the logic embedding the memory control or addressinputs. It is presumed that the memory itself has been previously tested, using automatictest pattern generation (ATPG) techniques via the correspondence inputs, and has beenfound to be fault-free.

Keywords: Fault detection; Fault simulation; Fault diagnosis; Built-in Self-Test (BIST); Multiple-Input Shift Register (MISR); Signature

1. INTRODUCTION

Built-in self-test (BIST) has seen increased use bymajor computer and chip manufacturing compa-nies over the last decade. Even though BISTmethodology as a test vehicle has reached itsmaturity, more is required in advancing anddefining an efficient diagnostic process. Withoutsuch a process, a BIST-based diagnostic becomes acomplete nightmare.

Since BIST-based diagnostic is a very compli-cated, costly, and time consuming task, it isnecessary to have several diagnostic aids to guidethe process. First degree of complication stems

from the fact that BIST uses data compression,and the only thing available at the end of the test isa short signature. The second degree of complica-tion (and not less serious one) is due to thepresence of embedded memories. Errors (due tosome faults) penetrate the memories during BISTonly to show up at a later time as faulty signature.This error latency is one of the primary causes asto why the diagnostic success is quite poor. Still, athird degree of complication is due to a lack of"design for diagnosability" in BIST structures.Most BIST designs concentrate on efficient faultdetection, but lack the necessary "hooks" to easethe diagnostic task. As this paper will show,

* This work was supported by NJCST for the Center on Embedded System on a Chip Design.lit is assumed here that no boundary scan exists around the memories because of strict performance requirements.

487

488 J. SAVIR

proper distribution of the MISRs in the design canhighly increase the diagnostic resolution, as well asdecrease the amount of time spent on isolating thefaults.

Diagnostic aids constitute a "bag of tricks". Thetask of these diagnostic aids is to provide "shortcuts" in pinning down the failing pattern (out ofseveral million). Once the failing pattern isdetermined, deterministic diagnostic practices canbe used to isolate the fault (or fault equivalenceclass). In [7, 12] two such diagnostic aids have beendescribed to identify the failing test in combina-tional logic. These methods may be used quitesuccessfully to diagnose failures in the combina-tional logic between shift register latches, providednone of the memories have been corrupted.

Other prior art is relevant to this paper. In [2]general discussion of BIST diagnostic is provided.In [9] an efficient BIST scheme for naked memoriesis described for which fault diagnosis comes forfree. Still another BIST scheme with self diagnosticcapabilities are described in [8]. A memory BISTthat is based on error detecting codes is describedin [6]. BIST and diagnosis of analog circuits isdescribed in [5]. BIST for static RAMs is describedin [13]. In [3] a description is given for BIST forembedded RAMs.The diagnostic objectives of this paper are

several fold: isolate failures during chip bring-up(lst silicon); deal with design and process changes;rectify fabrication problems, etc.The test environment can simply be described as

follows. A logic structure which conforms to thelevel sensitive scan design (LSSD) [4] ground rules,and which contains embedded random accessmemories (RAMs) is being tested with pseudo-random patterns. The responses of the structureare being compressed in a single-input or multiple-input signature register (SISR or MISR respec-tively). A faulty signature is observed after testcompletion. It is necessary to identify the faultlocation in order to permit a repair action. Itis assumed that the memory itself has beenpreviously tested using known memory test

techniques and found to be fault-flee. It is furtherassumed that a single fault exists in the faultycircuit that undergoes diagnostics.

This paper first describes the simulation-baseddiagnostic methodology as applied to a simple,single-port, Read/Write memory. This simple caseis then generalized to BIST designs having multi-ple single-port, Read/Write memories. The diag-nostic bottlenecks that exist in this, more generalcase, are then identified. It is shown that there is aneed to distribute the MISRs throughout thedesign, rather than have a single MISR collect asingle signature, in order to ease the diagnostictask. The distributed MISRs (ideally one for eachembedded memory) add a degree of observabilitythat is very useful during diagnosis (this does notnecessary add any more power to the faultdetection capability, though). MISRs distributionshould be considered as one dimension in thedesign for diagnosability. Obviously, there is anadded cost to this design.

Section 2 describes the main diagnostic proce-dure. Sections 3-6 describe the core properties ofdiagnosing failures in BIST circuits having a singlememory. More specifically, Section 3 describes thememory and the techniques for fault identificationin the memory Prelogic (the logic feeding the data-in lines) and Postlogic (the logic fed by the data-out lines). In Section 4 we describe a mechanismfor locating the fault in the address or controllogic, and in Sections 5 and 6 we show specialtechniques for isolating address and control logicfaults. Section 7 discusses fault diagnosis in circuitshaving multiple embedded memories. Section 8outlines the enhanced diagnostic procedure. Sec-tion 9 shows some experimental results. Finally,Section 10 concludes with a listing of the specialcontrol functions that our methodology requiresfrom the memory and embedding logic design, andsummarizes the important findings of this paper.

In the sequel it is assumed that the reader isfamiliar with deterministic diagnostic practicesand basic BIST diagnostic methodology forLSSD-based designs [1,2].

FAULT DIAGNOSIS 489

2. MAIN DIAGNOSTIC PROCEDURE

As a framework for discussion we first introducethe main diagnostic routine as practiced withBIST-based products which also conform with theLSSD design rules.

Figure shows the circuit (chip, card, etc.)view in BIST mode. The BIST hardware is shownseparate from the functional circuit. The BISThardware includes two linear feedback shiftregisters (LFSRs) driving random inputs intothe primary inputs (PIs) and shift register input(SRI). The MISR and the SISR collect the testresponses from the primary outputs (POs) andshift register output (SRO). It is inherent that theentire circuit conforms with the LSSD designrules. Figure shows a single scan chain whoseinput is SRI and whose output is SRO. In casethere are multiple scan chains, multiple SRIs andSROs will exist connecting to their separateLFSR and SISR. The A/B inputs are the scanchain shift clocks, and the MCs are the functionalmachine clocks.A large number of tests are to be applied to the

functional unit. For each test the controller must

PIs

LFSR FUNCTIONAL CIRCUIT

NB

CONTROLLER

MCs SRO

POs

[MISR

FIGURE Circuit view in BIST mode.

dothe following:

(1) Load a random (actually, pseudo-random) testvector into the SR string of the functionalcircuit by simultaneously clocking the SRILFSR and the A/B shift clocks. On each clockcycle, one new pseudo-random bit is generatedby the LFSR and shifted into the SRI. Enoughshift clock cycles are applied to fill the scanstring completely.

(2) Apply one clock cycle to the PI LFSR to applya new pseudo-random vector to the PIs.

(3) Cycle through the machine clocks (MCs) tocapture the responses to the random stimuli inthe scan string latches.

(4) Unload the responses on the POs to the MISRby applying one clock cycle to the MISR shiftclocks.

(5) Unload the captured responses in the internallatches via the scan path to the SISR bysimultaneously clocking the A/B clocks andthe SISR shift clocks.

Steps and 5 of the process can be overlappedsince it is possible to unload the SR string into theSISR at the same time as the next pseudo-randompattern is loaded into the SRI. A pass/failindication is obtained after the last test bycomparing the binary values remaining in theMISR and SISR with the expected values.

In order to ease the diagnostic process the entiretest is divided into smaller test windows for whichthe expected signatures are precomputed. Innormal BIST mode the entire test is run andsignatures are compared at the end of the test. Ifthe signatures at the end of the test are correct, thecircuit is declared fault-free. If the signatures areincorrect the diagnostic process is invoked tosuggest a repair action. To do this, the test isrerun stopping at the end of each test window tocompare intermediate signatures against theirprecomputed values. The test continues till a testwindow is reached that captures a wrong signa-ture. It is, then, most likely that the first failing testhas occurred during this last test window. This is

490 J. SAVIR

true in most, but not all, cases. In some cases thediagnostic procedure may experience a latent fault,where wrong data is written to some memorylocation, but being read out at a much later time.Thus, a latent fault may be captured in one testwindow and show up as an error at a later one. Wewill say more about this case later in the paper.The diagnostic process begins by restoring the

state of the machine to what it was at thebeginning of the failing test window. The testmode is then switched from BIST to LSSD/DIAGNOSIS mode. In this mode the MISR andthe SISR act as pure shift registers, in order to beable to investigate each and every test patterndeterministically. The last test window is thenrepeated, pattern by pattern, till the failing patternis encountered. Deterministic diagnostic practicesare then invoked to isolate the fault, or faultequivalence class [1].

3. PRELOGIC AND POSTLOGICFAULTS IN A SIMPLE R/W MEMORY

To simplify the discussion we assume a single portRead/Write memory within combinational logic asshown in Figure 2. We will address circuits withmultiple embedded memories later in the paper.

Figure 2 shows the circuit as seen at the time thediagnostic process begins. Notice that the BISThardware is not shown. Inputs to the structure arethe primary input groups marked D1, D2, D3, andD4. The only observation points are the PrimaryOutputs (POs). The memory contains 2n words ofn bits each (this is a special case in which thememory word is exactly as wide as the addressspace, and will later be generalized). The numberof data-out lines is equal to the number of data-inlines. When the R/W Control Line is set to Write,the word on the memory data-in lines is written tothe selected address in a write-through mode (sothat the memory data-out lines take on the valueof the word written). When the R/W Control Lineis Set to Read, the word stored at the selectedaddress appears at the memory data-out lines. All

D LD O

D3 R GES CS

CONTROL

LOGIC

R/AD(0)

AD(n-I)

D4

L PRELOGIC

DATA-IN

RANDOM ACCESS MEMORY

2 Words of Bits

DATA-OUT

POSTLOGIC

PRIMARYOUTPUTS

FIGURE 2 A simple R/W embedded memory.

primary input and primary output values are savedfor each test in the sequence.The first step in the method is to perform good-

machine simulation on just the Control Logic andthe Address Logic to determine both the memoryaddress referenced and the type of operation, Reador Write, for each of the K tests in the testsequence. This information is used to construct anEvent Table, a self-explanatory example of whichis shown in Table I.The next step is to use this event table to

construct a Simulation Event Table which shows,for each test (now called an event), the test number

Test

TABLE Event Table for the memory of Figure

Address R/W9 Read

2 61 Write3 37 Write4 43 Read5 10 Write6 37 Write7 10 Read8 43 Read9 10 Read

K 38 Read

FAULT DIAGNOSIS 491

at which the word on the memory data-out lineswas written. An example, derived from eventTable I, is given in Table II. In Table II, the entry0 in the column "Data Written at Test No".indicates that the data-out lines take the initializa-tion word for the selected address (since thisparticular memory address has not been writtensince initialization).The last step is to use the Simulation Event

Table to move the primary input values whichgenerated a word to the test at which that wordwas read out. By doing this, the memory iseffectively removed from the structure and simula-tion can proceed on a tester-loop-independentbasis. Figure 3 shows the simulation model to beused for Test 7 of Table II. The D1 inputs arethose that existed at Test 5 since from theSimulation Event Table the word read at Test 7had been written during Test 5. The D4 inputs arethose that existed at Test 7. It is apparent that thePO values in Figure 3 are the values read out of thestructure at Test 7.

There is a second simulation model that has tobe used if the data-out word for a particular test isthe initialization word. It is presumed that thewords initially in each memory location areknown. As a matter of fact, these are the contentsof the memory words at the start of the testwindow. (In order to ease the diagnostic processthe entire test is divided into smaller test windows

TABLE II Simulation Event Table constructed from Table

Data writtenEvent Address at Test No.

9 02 61 23 37 34 43 05 10 56 37 67 10 58 43 09 10 5

K 38 27

D4(TT)

DI(T5)

PRELOGIC

POSTLOGIC

0$ (TT)

FIGURE 3 Tester-loop-independent simulation of Test 7.

for which the expected signatures are precom-puted.) Figure 4 shows this alternative model usingas an example Event 8 from Table II. Again itshould be evident that the PO values from Figure 4are the same as those captured at the actual Test 8.The Simulation Event Table is used to construct

simulation models of the type shown in eitherFigure 3 or Figure 4, and good machine simulationis used to compute the expected PO values for eachtest. These expected values are compared with theactual test values, received during test, to identifythe failing tests.The first diagnostic assumption is that a single

fault exists in the structure and that the fault is ineither the Prelogic or the Postlogic. Simulationmodels of the types shown in Figures 3 and 4are fault simulated to identify a fault (or faultequivalence class) which satisfies the failing output

D4(T8)

Initialization word at address 43

POSTLOGIC

PFllMARY OUTPUTS (T8)

FIGURE 4 Tester-loop-independent simulation of Test 8.

492 J. SAVIR

values. If such a fault is found, either in thePrelogic or Postlogic, the job is done.2 However,when multiple failing tests occur, it may turn outthat no single fault can explain all failing tests.With incompatible fault equivalence classes inPrelogic and Postlogic, one could postulate multi-ple faults, but it seems more appropriate tomaintain the assumption of a single fault and tosuspect a fault in either the Control or the Addresslogic.

4. SEPARATING ADDRESSFROM CONTROL LOGIC FAULTS

A fault in the Control Logic will either cause anunintended write or inhibit an expected write. Afault in the Address Logic will access (either a reador a write operation) an incorrect address. Toidentify which of the two logics contains the faultit seems necessary to postulate an intelligentdiagnostic program working from a list ofheuristics, the Event Table, the Simulation EventTable, and the set of failing tests. The heuristicsmight include the following:

(1) Suppose all (or most of) the write operationsfail. Since writing is in a "write-through" moderegardless of the addresses selected, weight isadded to the hypothesis of a Control Logicfault.

(2) Suppose all (or most of) the reads of someaddress A fail, that writes to A pass, and thatreads of at least some other addresses pass.Weight is added to the hypothesis of anAddress Logic fault.

(3) Suppose a failing test is a write operation tosome address A. There may be a subsequentread of A (without any intervening writes). Ifthis read also fails, a Control Logic fault issuggested.

(4) Suppose a read of address A fails. If there is apreceding write to A that passes, and if thereare earlier reads which also pass, an AddressLogic fault is suggested.

It is important to note, of course, that theheuristics must be tailored to the type of memorybeing tested. However, the purpose of the intelli-gent program is to identify which logic probablycontains the fault for the purpose of reducingsubsequent computation, and avoiding unexplain-able error symptoms.

5. ADDRESS LOGIC FAULTSIN A SIMPLE R/W MEMORY

Suppose now that the program suggests the faultlies in the Address Logic. We want to performfault simulation on the address logic to identifythe fault. We start by noticing that the memoryword is exactly as wide as the address space. (Wewill describe later how to cope with memories inwhich the word is narrower or wider than theaddress space). Now by using the correspondenceinputs3 and the address cycling register,4 wedeterministically load each memory word withits address as illustrated in Table III. Thiscapability is assured since the memory itself waspreviously tested good using the correspondenceinputs.The next step is to rerun part of the test

sequence on the structure with the memory controllogic (which is assumed to be fault-free) held to aconstant read. To do this requires generating adeterministic vector for inputs D2 of Figure 2which will hold the R/W line at a Read value.(Alternatively, since we determined which testscaused read operations in the process of generatingTab. I, the Event Table, we could use one of theseread-invoking sets of D2 values).

2 Because of the single-fault assumption.These are inputs that feed directly, or indirectly, the data-in lines, R/W control, and address lines, so that when a proper vector is

set, it is possible to write into the memory arbitrary data.4Normally added to BIST designs in order to ease cycling through the address space; implemented by a counter.

FAULT DIAGNOSIS 493

TABLE III Data written to addresses prior to address logicsimulation

Data writtenAddress to the address

0 000...00000...01

2 000...103 000...11

2n- 111...11

During this rerun of the tests, the Prelogic is outof the circuit, since we are holding a constant read.Because each word in the memory contains itsaddress, whenever an address A is applied to thememory decoder inputs the address A appears onthe memory data-out lines. Effectively the memoryhas been removed and a structure created whichbehaves exactly like the circuit shown in Figure 5.In Figure 5, test T7 is simulated. The D3 inputs tothe Address Logic and the D4 inputs to thePostlogic are those from T7. Notice, however, thatthe PO values T*7 are not those that were recordedfrom the original Test 7 since the memory contentsare not the same. We must do good-machinesimulation of the structure of Figure 5 to obtainthe expected T* output values for comparison withthe values obtained from the test rerun. Given theexpected values and the actual values from

D3(T7)

ADDRESS

LOGIC

D4(T7) POSTLOGIC

PRIMARY OUTPUTS (T*7)

FIGURE 5 Structure for fault simulation of Address Logic.

rerunning the tests, we can do fault simulationon the structure of Figure 5 to identify the fault (orequivalence class) within the Address Logic.The question now is how to identify the tests

which must be repeated during simulation of theAddress Logic. We earlier used good machinesimulation on models of the types shown inFigures 3 and 4 to compute the expected POvalues for comparison with the actual values toidentify the failing tests. Let T. be the first failingtest. Now suppose that a prior test, say test Ti, wasa write operation to some address A. During Ti thememory is in write-through mode, and since thePrelogic and Postlogic are fault-free the testpasses, regardless of the Address Logic fault. Ifduring Ti the fault affects the memory decoderinputs, then instead of writing address A, the faultcauses a write to an incorrect address B. Asubsequent read operation on address A (oraddress B) retrieves an incorrect word, and theread operation fails. Hence we find it impossible toidentify the precise "failing tests" other than to saythat there was at least one failing test prior to test

T. It is, therefore, necessary to repeat all of thetests up to and including T. For each of thesetests, the models of Figure 5 are good-machine-simulated to obtain PO values for comparisonwith the actual values obtained from the tests. Thiscomparison will identify the actual "failing tests"which are then used in fault simulation (again withmodels like Fig. 5) to identify the fault.

If the memory word is narrower than theaddress space, we are forced into a multiple-passtest of the Address Logic. On each pass a portionof its address is stored in each memory word. Forexample, suppose the address space requires 8 bitsbut the memory is only 4 bits wide. On the firstpass we load the low-order 4 bits of the address toits word, and on the second pass load theremaining address bits. The appropriate portionof the test sequence then is repeated twice, once foreach pass, again holding a constant read. For eachpass we record the PO values. We, then, combinethe PO values for each test and compare thesevalues against a simulation model of the type

494 J. SAVIR

D4 Inputs

D3

ADDRESS

LOGIC

POSTLOGICPOs

1st

Pass

POSTLOGICPOs

2nd

Pass

FIGURE 6 Address Logic simulation when word is narrow.

shown in Figure 6. By comparing the values on thecombined POs during good machine simulationwith the actual values recorded during the two testpasses, we can identify the failing tests. Thesefailing tests can in turn be fault simulated, againusing models like Figure 6, to identify the fault.Note that the fault sites in the two postlogicreplicas need not be simulated since it is known, atthis point, that the Postlogic is fault-free.

If the memory word is wider than the addressspace, the surplus bits in each word may be loadedwith 0s, for example.

most straight forward case, and will later considerexpected reads.

6.1. When an Expected WriteOperation Fails

Consider a failing test for which the expectedoperation is a write to some arbitrary address A.Since the memory is write-through, the expectedword on the memory data-out lines is the same asthe word on the data-in lines. Furthermore, thePrelogic and the Postlogic are fault-free, so thatthe test should have passed if the operation wasactually a write. Since the test failed, we concludethat the actual operation was a read of address A.The foregoing argument applies to any failing

test in which the expected operation is a write. Wethen have the following: if we are sure that thefault lies in the Control Logic, then all failing testswhich are good-machine writes are actually faultyreads. This means that during such tests we arecertain that the value on the R/W control line isinverted from write to read. We can then do faultsimulation on the Control Logic for these tests toidentify a fault equivalence class which explainsthe failing tests. The simulation structure is shownin Figure 7 for a failing test t which is an expectedwrite. In Figure 7 the D2 inputs to the ControlLogic are those that existed at test t and the outputR/W(t) is incorrectly inverted from Write to Read.

6. CONTROL LOGIC FAULTSIN A SIMPLE R/W MEMORY

Suppose that the intelligent program suggests thefault lies in the Control Logic. We are quite surethat the Prelogic, the Postlogic, and the AddressLogic are fault-free, and will proceed on theassumption that they are indeed fault-free.

Earlier we have identified all the failing tests.From the event table we know the address and theexpected operation (read or write) for these failingtests. We first consider those failing tests for whichthe expected operation is a write, since it is the

D2(t)

CONTROL

LOGIC

RIW(h

FIGURE 7 Structure for Control Logic fault simulation.

FAULT DIAGNOSIS 495

6.2. When an Expected ReadOperation Fails

Consider those cases where the expected operationduring a failing test is a read of some address A. Inthese cases we have an uncertainty between twopossibilities:

(1) The read actually occurred correctly but thetest failed because the word stored at theaddress was incorrect, either because someearlier expected read was changed to a faultywrite, or an earlier write was changed to afaulty read.

(2) The read operation during this test waschanged by the fault to a write.

The first step in sorting out these possibilities isto search the event table for all tests whichaccessed address A prior to the failing test t. Weuse these tests to set up an activity table, anexample of which is shown in Table IV.The test numbers in Table IV are listed in

decreasing value and are those tests prior to andincluding failing test t which accessed address A.The Expected Operation column entries are fromthe event table that was constructed earlier.The Expected PO Values are obtained by a

tester-loop-independent good machine simulationof structures like that shown in Figure 8. In thesesimulations, the D inputs are those from the mostrecent write operation to address A. These areexactly the values which were listed in thesimulation event table constructed in Section 3 toobtain the PO values for each test. However,

TABLE IV Prior accesses to A and their actual PO valueswith D4(t)

Expected Expected Actual Mock-WriteTest # operation PO values PO values PO values

Read PO(Y) PO(X) PO(T)t- tl Read PO(Y) PO(X) PO(U)t- t2 Write PO(Y) PO(X) PO(Y)t- t3 Read PO(X) PO(X) PO(V)t- t4 Write PO(X) PO(X) PO(X)t- t5 Read PO(Z) PO(Z) PO(W)t- t6 Write PO(Z) PO(Z) PO(Z)

Dl(t- t3)

PRELOGIC

D4(t) POSTLOGIC

PO(X)

FIGURE 8 Expected PO values obtained by holding D4 atD4(t).

during each simulation, the D4 inputs to thePostlogic are held at their values during the failingtest t. Each of the prior accesses is simulated in thisfashion to complete the Expected PO Valuesentries in Table IV. (Note that these ExpectedPO Values are not those obtained earlier inSection 3).The column marked Actual PO Values is

obtained from a rerun of the listed tests on theactual structure when during each test the D4inputs to the Postlogic are held at their values duringthe failing test t. All other structure inputs(D1, D2, and D3) are driven with their originalvalues during each test. We record the PrimaryOutput values for each of these test reruns andenter these in the Actual PO Values column.The column marked Mock-Write PO Values is

obtained by doing a mock simulation of a writeoperation. Let test T9 be a test which is a readoperation in the good-machine. Now do a good-machine simulation of the Prelogic as shown inFigure 9 to obtain the PO values T*9. Because thememory is write-through, this simulation reflects apresumed write operation at T9. It is these T*values for each of the listed reads which areentered into the Mock-Write PO Values column.What is the activity table? At test t, some

incorrect word X appeared on the memory

496 J. SAVIR

E41T91

DI(T9)

PRELOGIC

POSTLOGIC

PRIMARY OUTPUTS (T*9)

FIGURE 9 Mock good-machine simulation of a Writeoperation at Test 9.

data-out lines instead of the correct word Y. Thisincorrect word X caused a set of incorrect POvalues PO(X) instead of the correct set PO(Y). Wewant to know how this incorrect word X got intothe memory. Since the Postlogic primary inputsD4(t) at test t exposed the incorrect word, we usethese primary inputs to expose the test at which Xwas loaded into the memory. This should becomeclearer as we describe how to use the table.We compare, starting up from the bottom of

the activity table, the expected, the actual, andthe mock-write values. An inconsistency betweenexpected and actual values, if supported by the

mock-write values, indicates a test on which theR/W control line failed.As an example, in Table IV it is obvious that the

Write operation at test t-t2 the R/W control linewas inverted from Write to Read. Now we can usefault simulation on the Control Logic as shownin Figure 7, using the appropriate D2 values, toidentify the actual fault.The Control Logic fault may cause multiple

R/W line failures before a failing test occurs.Consider, as an example, the sequence shown inTable V where test t is a failing test, and there are3 prior accesses to the address. Beginning atthe bottom of the table (the earliest access) andworking up"

(1) Test t-- wrote word Z correctly.(2) Test t-fl failed. The PO values should have

been PO(Z), but the actual values were PO(X)which matches the Mock-Write values for thistest. Hence, the fault caused the R/W line toinvert from Read to Write.

(3) Test t-a also failed. It should have written Y,but instead the fault inverted the R/W line to aRead.

As a final example, Table VI shows 3 R/W linefailures at test t, t-a, and t-/3.

Test #

TABLE V A multiple-failure activity table

Expected Expected Actual Mock-Writeoperation PO values PO values PO values

Read PO(Y) PO(X) PO(W)t-c Write PO(Y) PO(X) PO(Y)t- fl Read PO(Z) PO(X) PO(X)

"y Write PO(Z) PO(Z) PO(Z)

TABLE VI R/W line failures at Test t, t-a, and t-Expected Expected Actual Mock-Write

Test # operation PO values PO values PO values

Read PO(Y) PO(X) PO(X)ce Write PO(Y) PO(Z) PO(Y)

t-/ Read PO(X) PO(Z) PO(Z)t-- Write PO(X) PO(X) PO(X)

7. DIAGNOSTIC OF CIRCUITSWITH MULTIPLE MEMORIES

When the circuit has multiple memories faultisolation is even more complex. To see this,consider a case of a circuit having two embeddedmemories M1 and M2. Assume further that onlyone MISR exists to collect test responses at thecircuit POs. Let the Simulation Event table beconstructed as was previously suggested. Considerthe case where the table indicates that many readsto some address A in memory M1 fail, and manywrites in memory M2 fail. Is this due to a likelyfault in the Address Logic of memory M1, or afault in the Control Logic of memory M2? The

FAULT DIAGNOSIS 497

answer is that it could be either. Thus, the"intelligent program" is unable to guide thediagnostic process in a manner similar to whatused to be in a case where only a single memoryexists. The diagnosis, then, has to proceed bycarefully examining both cases. This increases thediagnostic time tremendously.What can be done to ease this case? Unfortu-

nately, not much can be done unless specialattention is given to the BIST circuitry at thedesign stage. To see this, consider the same case asbefore, except for the fact that two separateMISRs are used to collect signatures from memoryPostlogics. Under the single fault assumption, onlyone MISR will show the evidence of the fault, andtherefore, only one Simulation Event table will beneeded. The intelligent program will be able toguide the diagnostic process as if the circuit had asingle embedded memory.

This argument suggests quite clearly thatdistributed BIST hardware is preferable to thetraditional design that uses one MISR to collect a

global test signature. Ideally there should be aseparate MISR for each memory Postlogic. Thiscan be done without degrading the fault detectioncapability. It may require, though, a somewhathigher BIST hardware cost. As an alternative todistributed MISRs it is possible to have a singleMISR with gated inputs, so that a signaturereading may be obtained from separate memories.This, however, will increase the diagnostic time,since several reruns with gated MISR inputs wouldbe necessary to isolate the fault. Moreover, thismay require computing more signatures forreference.

There may be cases where a single Postlogic iscommon to more than one memory. In these casesadditional control inputs may be added to makethis shared Postlogic look as two separate ones inBIST mode.There may be cases where a single Prelogic is

shared by more than one memory. In these case itis possible for a Prelogic fault to corrupt multiplememories. This may show up in more than onefaulty MISR signature. As a matter of fact, the

intelligent program should add weight to aprobable Prelogic fault, so that the diagnosticprocedure investigates it first.A decision on how many MISRs to use, and

how to distribute them inside the circuit should bebased on a compromise between the need fordiagnostic efficiency, low hardware overhead andlow performance impact.

8. THE ENHANCED DIAGNOSTICPROCEDURE

The intelligent program has been written andadded to the existing main diagnostic procedure.The intelligent program provides "advice" as towhere to look for the fault based on the"symptoms" that surface in the Simulation EventTable. The enhanced diagnostic procedure, DI-AGNOSE, follows the following steps. It isassumed here that the test has failed at testwindow I. Let be a pointer to the current testwindow that was added to the diagnosticprocess.

(1)(2)

(3)

(4)

(6)

Let i=LIf 0 no single fault can explain the observedfailing signature. The fault is unexplainable.Stop.Restart all test windows from to I and recordeach pattern activity. Complete the EventTable and Simulation Event table for thesetest windows.Perform fault simulation according the adviceprovided by the intelligent program; use theappropriate simulation model as the casewarrants (the different simulation models havebeen discussed earlier).If a single fault (or fault equivalence class)explains all fails-initiate a repair actionaccordingly-Stop. If no such fault (or faultequivalence class) is found-continue.Set i-1 and go to Step 2.

If a single pass through DIAGNOSE does notreveal a repair action, it backtracks to a previous

498 J. SAVIR

test window to try to explain the symptoms by alatent fault that might span more than one testwindow. DIAGNOSE, unless otherwise directed,will keep trying to explain the symptoms by goingas far back as the initial test window. The testengineer, however, may want to limit the back-tracks to save on diagnostic time. If backtrackingdoes not help isolate the fault, multiple faults arethen probable. Since the cost of going aftermultiple faults is by far higher, DIAGNOSE givesup on those.

9. EXPERIMENTAL RESULTS

In order to investigate the efficiency ofDIAGNOSE and the effect of single vs. dis-tributed MISRs on the diagnostic resolution thefollowing experiment has been conducted. CircuitCktl is a portion of a chip that includes a singlesimple memory. Circuit Ckt2 is a chip thathas been designed for BIST applications, has 8memories, but has only a single MISR. CircuitCkt3 has the same function as Ckt2, but insteadof having a single MISR it has 8 (shorter) MISRsdistributed over the chip to achieve separateobservability from each memory. Circuit Cktlhad approximately 30,000 equivalent gates, andcircuits Ckt2 and Ckt3 had approximately250,000 equivalent gates.The total test length was 100 K vectors divided

into 100 windows of K each. We have injected10 random stuck-faults into the circuit modelsand invoked DIAGNOSE. Circuits Ckt2 andCkt3 have been injected with the same set ofrandom faults. The results are reported inTable VII.

TABLE VII Experimental results from DIAGNOSE on 3chip models

No. of explained Avg. diag. Max. no. of testCircuit faults time in hrs. windows backtracked

Cktl 10 0.8 3Ckt2 5 10.2 8Ckt3 9 1.2 15

The first thing to note from Table VII is that ittook on the average 0.8 hours to diagnose thefaults in Cktl, and that DIAGNOSE was able toisolate all faults (up to their equivalence class).This is not much of a surprise since Cktl has onlyone memory, and this was the main vehicle bywhich DIAGNOSE was designed.The interesting part is the comparison between

DIAGNOSE performance on circuits Ckt2 andCkt3. Besides the fact that DIAGNOSE missedmany more faults in circuit Ckt2, it is obvious thatDIAGNOSE has to work harder in this case asevidenced by the average diagnostic time. This isthe case where a single MISR was used tocompress all test data, and obviously DIAGNOSEwas incapable of giving "good" advice as to whereto look for the faults. So, in this case DIAGNOSEwas performing an exhaustive "walk" through theset of possibilities, postulating a fault location, andthen trying to prove or disprove it. Since the totaldiagnostic time given to an experiment waslimited, DIAGNOSE did not have enough timeto backtrack far enough to catch 4 of the latentfaults. In circuit Ckt2 DIAGNOSE was only ableto backtrack a maximum of eight test windows,compared to 15 for circuit Ckt3. Thus, DIAG-NOSE was more efficient in chasing the faults incircuit Ckt3, and successfully explained 9 out ofthe total 10. This higher efficiency was due to theseparate MISRs that existed in Ckt3.Another point to notice is that the average

diagnostic time for circuit Ckt3 is only slightlyhigher than that for Cktl. This slight increase isdue to Ckt3 having some shared Prelogics betweenmemories, and being 8 times larger than Cktl.This definitely proves that DIAGNOSE is quitesuited to handle BIST structure with multiplememories provided attention is given to the properdistribution of the MISRs.What happened to the 10th fault in the case of

circuit Ckt3? As it turned out, the pseudo-random test of length 100 K was insufficient todetect this fault. So, since the fault was notdetected, DIAGNOSE did not have a chance togo after it.

FAULT DIAGNOSIS 499

10. CONCLUSIONS

A method has been described for using tester-loop-independent fault simulation to diagnose faults inthe Prelogic, Postlogic, the Address Logic, and theControl Logic of an embedded memory. Asimulation event table is generated which showsthe data source of each test operation. Each eventin the table can then be simulated using the sourceprimary input values with the memory removed toidentify the failing tests. Fault simulation of thefailing tests with the appropriate simulationmodels will identify the fault or fault equivalenceclasses. When fault simulating with replicatedcopies of the Prelogic, as required by somememories, the same fault site in each of the copiesmust be provoked.The program DIAGNOSE has been written to

efficiently isolate faults in BIST designs havingembedded memories. DIAGNOSE includes an"intelligent program" running on heuristics toreduce the simulation load by suggesting either acontrol or an address logic fault. Special techni-ques for isolating these type of faults weredescribed. It was noted that these techniques mustbe tailored to the specific design of the memoryunder test.The heuristics used here require that certain

control capabilities exist over the embeddedmemories. We list these capabilities here for easeof reference.

(1) It must be possible to load any desired wordinto any address of a memory using thecorrespondence inputs. (This makes it possi-ble to load complete or partial addressesto memory before Address Logic faultisolation).

(2) It must be possible to hold the memory controllines at a constant read operation, by holdingthe D2 inputs at a fixed value, while the otherinputs (D1, D3, and D4) are at their normalpseudo-random test values. This allows a testrerun with only read operations to isolateAddress Logic faults.

(3) It must be possible to hold the D4 inputs to thePostlogic at a fixed value while all other inputs(D1, D2, and D3) are at their normal pseudo-random test values. This allows the test rerunto identify the Actual PO Values for theactivity table, and permits isolating tests atwhich the memory control line failed.

Our experiments have shown that by allowingseparate MISRs collect test responses from thedifferent memories, the diagnostic efficiency andresolution greatly increases. This feature should bean integral part of any design for diagnosability. Acost effective way of distributing the MISRs insidethe circuit is still an open question. In doing somany parameters need to be considered: hardwareoverhead, performance degradation, diagnosticefficiency, wireability, etc.

This paper concentrated on the diagnostic effortthat is initiated at the time the first test windowfails. There is a lot of merit, however, to continuethe test even after the first failing test window inorder to collect more diagnostic information. Thiswill help narrow down the set explaining the fault toa smaller area. There are several problems associa-ted with this approach. The first one being: how doyou determine which future test window has failed,since these test windows will already be tainted byan incorrect initial seed in the MISR? Solution tothis problem has been reported in [10, 11].

References

[1] Abramovici, M., Breuer, M. A. and Friedman, A. D.,Digital Systems Testing and Testable Design, IEEE Press,Piscataway, New Jersey, 1994.

[2] Bardell, P. H., McAnney, W. H. and Savir, J., Built-InTest for VLSI: Pseudorandom Techniques, Wiley Inter-science, New York, 1987.

[3] Camurati, P., Prinetto, P., Reorda, P., Barbagallo, M. S.,Burri, S. and Medina, A. (1995). Industrial BIST ofembedded RAMs. Design and Test of Computers, 12,86-95.

[4] Eichelberger, E. B. and Williams, T. W. (1978). A LogicDesign Structure for LSI Testability. J. Design Automationand Fault- Tolerant Computing, 2, 165-178.

[5] Hatzopoulos, A. A., Siskos, S. and Kontoleon, J. M., Acomplete scheme of built in self tests (BIST) structurefor diagnosis in analog circuits and systems. IEEETransactions on Instrumentation and Measurement, 42(3),689-694, June, 1993.

500 J. SAVIR

[6] Karpovsky, M. G. and Yarmolik, V. N., Transparentmemory BIST. In: Proc. of the IEEE InternationalWorkshop on Memory Technology, Design, and Test,pp. 106-111, August, 1994.

[7] McAnney, W. H. and Savir, J., There is information infaulty signatures. In: Proc. Int. Test Conf., pp. 630-636,September, 1987.

[8] Chin Tsung Mo, Chung Len Lee and Wen Ching Wu, Aself diagnostic BIST memory design scheme. In: Proc. ofthe IEEE International Workshop on Memory Technology,Design, and Test, pp. 7-9, August, 1994.

[9] Savir, J., Fast RAM test and diagnosis. In: lASTED Int’l.Conf. on Modelling, Simulation and Optimization, on CDROM, May, 1996.

[10] Savir, J., Salvaging test windows in BIST diagnostics. In:Proc. VLS1 Test Symp., pp. 416-425, April, 1997.

[11] Savir, J., Salvaging test windows in BIST diagnostics.IEEE Trans. Computers, 47(4), 486-491, Apr., 1998.

[12] Savir, J. and McAnney, W. H., Identification of failingtests with cycling registers. In: Proc. Int. Test Conf.,pp. 322-328, September, 1988.

[13] van Sas, J., Van Wauwe, G., Huyskens, E. and Rabaey,D., BIST for embedded static RAMs with coveragecalculation. In: Proc. Int. Test Conf., pp. 339-348,October, 1993.

Authors’ Biography

Dr. Savir holds a B.Sc. and an M.Sc. degree inElectrical Engineering from the Technion, IsraelInstitute of Technology, and an M.S. in Statisticsand a Ph.D. in Electrical Engineering fromStanford University. He is currently a Distin-guished Professor at New Jersey Institute ofTechnology, where he held the position of Director

of computer engineering (1996-2000), and New-ark College of Engineering Associate Dean forresearch (1999-2000). Previously with IBM, Dr.Savir was a Senior Engineer/Scientist at the IBMPowerPC Development Center in Austin, TX; atIBM Micro electronics Division in Hudson ValleyResearch Park; at IBM Enterprise Systems inPoughkeepsie, NY, and a Research Staff Memberat the IBM T. J. Watson Research Center,Yorktown Heights, N.Y. He was also an AdjunctProfessor of Computer Science and InformationSystems at Pace University, N.Y, and SUNYPurchase, N.Y.

Dr. Savir’s research interests lie primarily in thetesting field, where he has published numerouspapers and coauthored the text "Built-In Test forVLSI: Pseudorandom Techniques" (Wiley, 1987).Other research interests include design automa-tion, design verification, design for testability,statistical methods in design and test, faultsimulation, fault diagnosis, and manufacturingquality.

Dr. Savir has received four IBM InventionAchievement Awards, six IBM PublicationAchievement Awards, and four IBM PatentApplication Awards. He is a member of SigmaXi, and a fellow of the Institute of Electrical andElectronics Engineers.

International Journal of

AerospaceEngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2010

RoboticsJournal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Active and Passive Electronic Components

Control Scienceand Engineering

Journal of



RotatingMachinery


Hindawi Publishing Corporation http://www.hindawi.com

Journal ofEngineeringVolume 2014

Submit your manuscripts athttp://www.hindawi.com

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawi Publishing Corporation http://www.hindawi.com

Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

SensorsJournal of


Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation



DistributedSensor Networks


BIST-Based Fault Diagnosis Presence Embedded...

Documents

Transcript of BIST-Based Fault Diagnosis Presence Embedded...