Retrospective on “Diagnostic reasoning based on structure and behavior”

Artificial Intelligence 59 (1993) 149-157 149 Elsevier

ARTINT 989

Retrospective on "Diagnostic reasoning based on structure and behavior"

R a n d a l l D a v i s

Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA

1. Origins

Interest in model-based reasoning arose out of the desire to capture reasoning based on an understanding (i.e., a model) of how a device works. Consider, for example, an automobile that presents the following symptoms: nothing happens when you turn the ignition key to start it, but the radio works. Even without ever having encountered that particular set of symptoms before, most people can quickly infer that the starter might be broken but the battery is ok, by reasoning from a very simple model of the structure (e.g., there is only one battery) and behavior (e.g., batteries supply power) of the device. Knowing how something is supposed to work provides a strong foundation on which to build diagnostic engines, as well as supporting test generation, design, and design for testability.

The intellectual foundation for the paper "Diagnostic reasoning based on structure and behavior" [4] was provided by a significant body of prior work in AI, including Sussman's early work on constraints [17], de Kleer's work on troubleshooting using local propagation [6], Rieger's causal models [14], and the work of Brown et al. on CAI and troubleshooting in SOPHIE [21.

The report in [3] provides a contemporary and more detailed view of the early context for this body of work. The paper on diagnostic reasoning was motivated by having a program reason successfully about a fault that

Correspondence to: R. Davis, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. E-mail: [email protected].

0004-3702/93/$ 06.00 © 1993 -- Elsevier Science Publishers B.V, All rights reserved

150 R. Davis

introduces an unexpected interconnection in a digital circuit. Such faults pose a particularly interesting challenge for model-based reasoning, because one of the technique's central virtues is that it works from a description of structure and behavior (i.e., from the schematic), and with an unexpected connection the schematic is not an accurate description of the pathways of causality in the device.

2. Key intellectual contributions

The highlighted items in Sections 1 and 2 of the paper [4], while occa- sionally redundant, still stand the test of time reasonably well. Among the most significant are the following.

• Languages used in model-based reasoning should distinguish carefully between structure and behavior, and should provide multiple descriptions of structure, organizing it both functionally and physically.

• Diagnosis can be accomplished via the interaction of simulation and inference (later termed prediction and observation).

• Constraint suspension is a new tool for troubleshooting, capable of determining which components can be responsible for a set of observed symptoms.

• The concept of the paths of causal interaction--the mechanisms and pathways by which one component can affect another--is a primary component of the knowledge needed to do reasoning from structure and behavior.

• Model-based diagnosis is faced with an inescapable tradeoff of com- pleteness and specificity: if all possible paths of causal interaction are considered, diagnosis becomes indiscriminant, but omitting any one pathway means that an entire class of faults will be out of reach.

• The tradeoff can be managed by layering the paths of causal interaction considered during the diagnostic process.

• An enumeration of the different kinds of pathways of interaction can in turn be produced by examining the assumptions underlying the representation.

• This layering provides a novel way to view troubleshooting, casting it as the methodical enumeration and relaxation of assumptions about the device. A system working in this way can focus its efforts initially for the sake of efficiency and diagnostic power, yet will methodically expand its focus to include a broad range of faults.

• The concept of adjacency helps in understanding why some faults are especially difficult, what it means to have a good representation, and why multiple representations are useful.

Retrospective on "Diagnostic reasoning based on structure and behavior" 151

3. What was misunderstood?

3.1. What kind of model?

"Model" is sufficiently vague that the term has been used subsequently to describe a variety of different kinds of model-based systems. The models used here have several distinctive properties: they are models of structure and behavior; they are capable of both simulation and inference; they are organized around components and connections; and they are often hierarchical. As Sections 3 and 4 of the paper [4] note, structure includes both the functional and physical organization of components, while behavior is the standard black-box notion. To support model-based diagnosis, the model should be capable of both simulation (predicting outputs from inputs) and inference (inferring inputs from observations).

The model is component-oriented in the sense that it explicitly represents the components of which the device itself is composed, and interconnects them in the same way that they are interconnected in the device. In the world of digital devices, for example, a 32-bit ripple-carry adder is made up of 32 individual bit slices connected in a particular pattern. The model we build would similarly be composed of 32 subcomponents connected in the same pattern (Fig. 2 of [4] shows a 4-bit example). The nodes of this sort of model represent device components, while links represent pathways of interaction (e.g., wires) between components. 1 The models are hierarchical for all the standard reasons of efficiency that hierarchical descriptions offer.

Given the numerous other ways in which the term "model" can be used, many other systems can properly be called model-based. But as the title of the paper [4] purposely emphasized, it was talking about reasoning from one particular variety of model, viz., a model of structure and behavior. Some subsequent work appears to have misunderstood the source of power in this approach, apparently ascribing the power to reasoning from any model, rather than to reasoning from the specific kind of model used here.

3.2. Distinguishing logical and physical possibility of candidates

As the paper notes in Section 6.2, constraint suspension in particular and model-based reasoning in general distinguish between the logical possibility of a candidate and its physical possibility, a distinction that has subsequently at times been overlooked and then rediscovered. Logical possibility tells us whether there is any set of values a component might display that could

1A contrasting alternative is provided by state-oriented causal models, which represent the progression of states through which a device can pass. In this case nodes indicate states and links indicate transitions from one state to another. Note that in this variety of model neither the nodes nor the links have any correspondence to physical objects.

152 R. Davis

account for the observations, where physical possibility tells us whether a particular component is observed to fail in real use in a way that would produce the symptoms observed. For example, it might be logically possible to account for observed symptoms by hypothesizing that a wire on a circuit board is misbehaving by turning l's into O's and vice versa, even though there is in fact no common physical fault mode that causes a wire to start acting as an inverter.

This distinction was made initially in order to see what could be gained by reasoning only from how the device is supposed to work (i.e., reasoning only from knowledge about its correct behavior). In this view a device is a candidate if it displays anything other than its correct behavior. There are, by contrast, approaches based solely on fault models; these systems select as candidates any component with a known fault mode that would produce the observed symptom. As Section 6.2 explains, one virtue of reasoning from correct behavior is the breadth of diagnostic power it supplies: among other things, it can deal gracefully with symptoms that have never been encountered previously (e.g., those that arise from a new variety of failures).

Reasoning solely from correct behavior, however, ignores the useful constraint and focusing provided by fault models. Section 6.2 indicates one simple way in which model-based reasoning can make use of fault models: the list of candidates is pruned (perhaps during candidate generation itself) by checking to determine whether each logically possible candidate is known to fail in real use in a way that would produce the symptoms inferred for it.

The distinction between logical and physical possibility and the capacity of model-based reasoning to use fault models has at times been overlooked. As one example, the basic technique of model-based reasoning has subsequently been claimed (e.g., [ 11,18 ] ) to produce conclusions of the form "light bulb B3 is faulted: it is lit although there is no voltage [across it]".

In fact the actual conclusion in that case would be that "light bulb B3 is not behaving as expected: it is lit although there is no voltage". This second phrasing is an accurate expression of what model-based reasoning actually indicates, exhibiting the care it takes in distinguishing between "not behaving as expected" and "faulted". Not behaving as expected often means that the component is faulted, but the two are not synonymous: there are many other reasons why the component might not behave as expected (e.g., it is not wired up in the way the schematic indicates). Hence the distinction matters and has been a part of the model-based reasoning approach from early on.

3.3. Distinguishing fault modes and physical failures

A fault mode is the behavior produced by some variety of physical failure. One common fault mode in digital electronics, for example, is "stuck at 0",

Retrospective on "'Diagnostic reasoning based on structure and behavior" 153

in which a wire always carries a zero, no matter what we attempt to put on it. This behavior can arise from a variety of different physical events: the wire may be shorted to ground; it may be cut (and hence disconnected from any driving signal), which in some digital technologies produces a zero, etc. In the paper being reviewed here, the interesting physical failure was the inadvertent wiring together of two adjacent pins on a chip by a pool of solder (called a bridge fault); the fault mode produced is the and-gate behavior shown in Fig. 19 of the paper.

The paper thus distinguishes between the physical event and its behavioral consequences. The distinction matters for several reasons, one of which arises when probabilities are used to guide diagnosis: the probability of a physical event is a reasonably well-defined concept, but the probability of a behavior is considerably less obvious. Subsequent work using probabilities has at times glossed over this distinction.

3.4. Models and rules

Some work has investigated the notion of "turning models into rules", often on the grounds that model-based reasoning is supposedly slow, while rule-based systems are allegedly faster. As noted at some length elsewhere [5], this appears to be a confusion of form and content, apparently assuming that the power in a representation arises primarily from its form (e.g., a conditional statement) rather than its content (empirical association versus a description of structure and behavior). Rule-based systems capture one form of knowledge, while model-based systems represent a different kind of knowledge and use it in a correspondingly different way. Re-writing model-based reasoning as conditional statements is neither surprising (Post productions have long been known to be Turing equivalent) nor particularly useful (because speed and power arise primarily from the content, not the form of the knowledge).

3.5. Diagnosis as a process

AI work on model-based diagnosis, and indeed diagnosis in general, has attempted to capture the diagnostic process. That is, we not only want to determine what experts know and what answers that leads to (i.e., the epistemology of diagnosis), but also want to understand the process by which they arrive at that result. This emphasis on process is important for several reasons: It provides insight into one important form of human reasoning; it makes automated reasoners more transparent (and easier to build and hence more acceptable to users); and it is a source of significant reasoning power (expert diagnosticians are considerably better than any program on complex circuits or systems).

154 R. Davis

Hence efforts to recast model-based reasoning in logic, while useful as a way of studying issues of the epistemology and semantics of the task, also have the nontrivial problem of losing almost all contact with the process of diagnosis. Work attempting to view diagnosis as nonmonotonic inference may be an interesting challenge to logic and may offer some insight about the problem, but it often obscures distinctions that are important in understanding the diagnostic reasoning process.

4. Open issues

A number of the open issues mentioned in the paper remain subjects of active work, including: scaling up to realistically large devices, facilitating model construction, doing model selection, and handling analog devices.

4.1. Model construction

Perhaps the most significant pragmatic barrier to the routine industrial use of this technology is the difficulty of building a device model. Several factors make the task is daunting for many real devices:

(i) it requires an exhaustive, explicit reconstruction of the design of the entire device, often needing information not contained in available documentation,

(ii) there is often a large volume of information to be captured, and (iii) the information is often described in numerous informal languages

(e.g., the precision of schematics soon gives way to informal block diagrams with many, ill-defined sorts of arrows).

Hence even for devices as apparently simple as a personal computer, the modeling task is a challenging one of intellectual archeology (recon- structing the design details) and translation (from informal to more precise descriptions).

We need to make this process both easier and more intuitive. At the purely pragmatic level, simply being able to read and translate existing CAD files reduces the amount of manual work involved. More importantly, we need a better understanding of both the nature of the end-product (the model) and the process by which informal descriptions are translated into such a model.

4.2. Model selection

Given a basic knowledge of how to use models of structure and behavior in diagnosis, it is intriguing to push the process back one step: How are models selected to begin with? Since all devices can be viewed from

Retrospective on "'Diagnostic reasoning based on structure and behavior" 155

multiple perspectives, how do we decide which view is appropriate in any given circumstance? The is the sort of reasoning that goes on in the heads of engineers before the equations or block diagrams ever hit the page. Inter- esting starts on this problem have been made (e.g., [1,9,19], and others), but much remains to be done.

4.3. Scaling

Scaling model-based reasoning techniques to deal with realistic devices involves handling increases in both the number of components and the complexity of their behavior. Of the two, structure appears somewhat easier: some early work reported diagnosing a system of 2000 very simple components [10], while more recent work [7] handles multiple faults in devices with up to 5000 gates by doing most-probable-first generation of diagnoses.

Difficult problems arise from attempts to deal with complex behavior. The work in [12] for instance, describes the development of a vocabulary of coarse temporal abstractions that enables troubleshooting of devices whose behavior extends over many thousands of clock cycles. This set of abstractions reduces what would have been an untenable amount of detail to a level that can be handled with existing machinery. There appears to be considerable power in such abstractions and some evidence that one of the best sources for them is the vocabulary used by people who routinely solve the same task. The intuition here is that cognitive limitations force human experts to invent vocabularies that make problems tractable and that our systems can benefit significantly from adopting the same vocabularies. In the area of device behavior description, as with many others, we have only begun to accumulate the appropriate vocabulary.

4.4. Reasoning about analog devices

In some ways modeling the digital world is particularly easy: Considerable computational simplicity arises from reasoning with the discrete, finite set of values needed, and from the inherent directionality (i.e., distinguished inputs and outputs) of the devices. Conversely, a number of known difficulties arise in dealing with analog devices, because of the infinite set of continuous real numbers needed to model them and the non-directional nature of their behavior.

Problems in modeling include dealing with inexactness (e.g., components good to +10%), the necessity of propagating intervals instead of integers (and of doing interval arithmetic), and the large number of predictions generated (e.g., at nodes with multiple wires attached). Non-directional behavior of devices means that there is no notion of focusing on only those components causally "upstream" of a symptom. In principle, any

156 R. Davis

component in an analog circuit can be responsible for any symptom. Circuits are designed with stages in order to localize the effect of components, but little use is currently made of this information.

5. The future of the work

Numerous papers can be seen as reflecting the future of this line of work, many arising from independent efforts at other laboratories pursuing similar interests. A few examples illustrate various threads of work.

Work on GDE [8] offered the ability to diagnose multiple faults in ways that minimized the inherent exponential nature of the problem. It also displayed the use of probabilities of component failure, both to focus candidate generation and to select good probe points. Later work in this vein (e.g., [7]) showed that a probability-based search could allow the system to use correct behavior and fault mode information and still keep the combinatorics manageable.

The diagnostic problem was recast in formal terms in [13 ], where it was viewed as nonmonotonic inference in which the goal was to find the minimal set of abnormal components.

Work in [15] demonstrated that setting test generation in the context of model-based reasoning offered significant power, while [20] showed how design for testability could be similarly enhanced.

Work on XDE [12] offered the notion of coarse temporal abstractions as one illustration of the need for good abstractions as a means of dealing with complex behavior.

Finally, the GORDIUS program [16] offered the notion of generate, test, and debug as a problem solving paradigm, based in part on an understanding of the different kinds of knowledge embodied in empirical associations on one hand and models of structure and behavior on the other, and an understanding of how each form of knowledge could be used to support the other.

References

[1] S. Addanki, R. Cremonini and J.S. Penberthy, Graphs of models, Artif Intell. 51 (1-3) (1991) 145-177.

[2] J.S. Brown, R. Burton and J. de Kleer, Knowledge engineering and pedagogical techniques in SOPHIE I, II, and III, in: D.H. Sleeman and J.S. Brown, eds., Intelligent Tutoring Systems (Academic Press, New York, 1982).

[3] R. Davis, Expert systems---Where are we? And where do we go from here? AI Mag. 3 (2) (1982) 3-22.

[4] R. Davis, Diagnostic reasoning based on structure and behavior, Artif Intell. 24 (1984) 347-410.

Retrospective on "Diagnostic reasoning based on structure and behavior" 157

[5] R. Davis, Form and content in model-based reasoning, in: Proceedings IJCAI-89 Workshop on Model-Based Reasoning, Detroit, MI (1989).

[6] J. de Kleer, Local methods of localizing faults in electronic circuits, AI Memo 394, MIT, Cambridge, MA (1976).

[7] J. de Kleer, Focusing on probable diagnoses, in: Proceedings AAAI-91, Anaheim, CA (1991) 842-848.

[8] J. de Kleer and B.C. Williams, Diagnosing multiple faults, Artif lntelL 32 (1987) 97-130. [9] B. Falkenhainer and K.D. Forbus, Compositional modeling: finding the right model for

the job, Artif lntell. 51 (1-3) (1991) 95-143. [10] M.B. First, B.J. Weimer, S. McLinden and R.A. Miller, LOCALIZE: computer-assisted

localization of peripheral nervous system lesions, Comput. Biomed. Res. 15 (6) (1982) 525-543.

[ 11 ] G. Friedrich, G. Gottlob and W. Nejdl, Physical impossibility instead of fault models, Proceedings AAAI-90, Boston, MA (1990) 331-336.

[12] W.C. Hamscher, Modeling digital circuits for troubleshooting, Artif Intell. 51 (1991) 223-271.

[13] R. Reiter, A theory of diagnosis from first principles, Artif Intell. 32 (1987) 57-95. [ 14] C. Rieger and M. Grinberg, The declarative representation and procedural simulation of

causality in physical mechanisms, Tech. Rept. TR-512, University of Maryland, College Park, MD (1977).

[15] M. Shirley, Generating tests by exploiting designed behavior, in: Proceedings AAAI-86, Philadelphia, PA (1986) 884-890.

[16] R. Simmons and R. Davis, Generate, test and debug: combining associational rules and causal models, in: Proceedings IJCAI-87, Milan, Italy (1987) 1071-1078.

[17] R.M. Stallman and G.J. Sussman, Forward reasoning and dependency-directed backtracking in a system for computer-aided circuit analysis, Artif Intell. 9 (1977) 135-196.

[18] O. Struss and O. Dressier, "Physical negation" integrating fault models into the general diagnostic engine, in: Proceedings IJCAL89, Detroit, MI ( 1989 ) 1318-1324.

[19] D.S. Weld, Automated model switching, in: Proceedings Third Workshop on Qualitative Physics (1989).

[ 20 ] P. Wu, Design for testability, in: Proceedings AAAI-88, St. Paul, MN ( 1988 ) 358-363.

Retrospective on “Diagnostic reasoning based on structure and behavior”

Documents

Transcript of Retrospective on “Diagnostic reasoning based on structure and behavior”