A General Testability Theory:

33
A General Testability Theory: Classes, Properties, Complexity, and Testing Reductions Ismael Rodr ıguez, Luis Llana, and Pablo Rabanal Abstract—In this paper we develop a general framework to reason about testing. The difficulty of testing is assessed in terms of the amount of tests that must be applied to determine whether the system is correct or not. Based on this criterion, five testability classes are presented and related. We also explore conditions that enable and disable finite testability, and their relation to testing hypotheses is studied. We measure how far incomplete test suites are from being complete, which allows us to compare and select better incomplete test suites. The complexity of finding that measure, as well as the complexity of finding minimum complete test suites, is identified. Furthermore, we address the reduction of testing problems to each other, that is, we study how the problem of finding test suites to test systems of some kind can be reduced to the problem of finding test suites for another kind of systems. This enables to export testing methods. In order to illustrate how general notions are applied to specific cases, many typical examples from the formal testing techniques domain are presented. Index Terms—Formal testing techniques, general testing frameworks Ç 1 INTRODUCTION T ESTING consists in checking the correctness of a system by interacting with it. Typically, the goal of this interac- tion is checking whether an implementation fulfills a prop- erty or a specification. If the specification is formally defined then procedures for deriving tests, applying tests, and assessing the outputs collected by tests can be formal and systematic [5], [13], [23], [27], [29]. There exist myriads of formal testing methodologies, each one focusing on checking the correctness of a different kind of system (e.g. labeled transition systems [5], [14], [33], [34], temporal systems [3], [22], [26], [31], probabilistic systems [24], [32], Java pro- grams [6], [12], etc). Some methods focus on testing a part of the behavior considered critical. Other methods aim at con- structing complete test suites, that is, sets of tests such that, after applying them to the implementation, the results allow us to precisely determine whether the implementation is correct or not with respect to the specification. For example, checking the correctness of a non-deterministic machine could be impossible regardless of how many tests one applies, because a given behavior could remain hidden for any arbitrarily long time. Even if a machine is deterministic, we could need to apply infinite tests if the number of avail- able ways to interact with the implementation is infinite. In some cases where infinite tests are required, it could be the case that we can achieve any arbitrarily high degree of par- tial completeness with some finite test suite, thus enabling a kind of unboundedly-approachable completeness, rather than completeness. Alternatively, if the behavior of the system depends on temporal conditions and the time is assumed to be continuous, then we could need to check what happens when a given input is produced at all possible times, thus requiring an uncountable infinite set of tests. Since the feasibility of a testing method depends on our capability to test systems in finite time, investigating the conditions that enable/disable the existence of finite com- plete test suites in each case is a major issue in testing. In this regard, several works have studied the effect of assum- ing some hypotheses about the implementation [18], [19], [29]. By assuming hypotheses, the number of systems that could actually be the implementation is reduced, so less tests must be applied to seek for undesirable behaviors. Actually, this may yield a reduction in the number of necessary tests for completeness, from infinite to finite, in some specific cases [2], [15]. However, the general conditions that make the difference between requiring infinite or finite sets of tests (regardless of the use of hypotheses) are still unknown. In this paper, by testability we will understand how difficult is to test sys- tems, measured in terms of the size required by test suites to be complete. We think that the conditions leading to the testability cases commented on before must be analyzed, that is, whether complete test suites are: a) finite; b) infinite, but any arbitrarily high degree of partial completeness can be finitely achieved; c) countable infinite; d) uncountable infinite; e) or there does not exist any complete test suite at all. To the best of our knowledge, testability scenarios (b), (d), and (e) have never been defined or investigated. A few hypotheses enabling finite complete test suites for some par- ticular models, such as finite state machines, have been reported [5], [23], [27], [29], and some approaches to for- mally define cases (a) and (c) have been proposed [7], [8]. The authors are with the Department of Sistemas Inform aticos y Computaci on, Universidad Complutense de Madrid, Madrid 28040, Spain. E-mail: {isrodrig, llana}@sip.ucm.es, [email protected]. Manuscript received 30 Jan. 2013; revised 31 May 2014; accepted 11 June 2014. Date of publication 17 June 2014; date of current version 18 Sept. 2014. Recommended for acceptance by P. Inverardi. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference the Digital Object Identifier below. Digital Object Identifier no. 10.1109/TSE.2014.2331690 862 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 9, SEPTEMBER 2014 0098-5589 ß 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

description

A General Testability Theory:

Transcript of A General Testability Theory:

Page 1: A General Testability Theory:

A General Testability Theory: Classes,Properties, Complexity, and Testing Reductions

Ismael Rodr�ıguez, Luis Llana, and Pablo Rabanal

Abstract—In this paper we develop a general framework to reason about testing. The difficulty of testing is assessed in terms of the

amount of tests that must be applied to determine whether the system is correct or not. Based on this criterion, five testability classes

are presented and related. We also explore conditions that enable and disable finite testability, and their relation to testing hypotheses

is studied. We measure how far incomplete test suites are from being complete, which allows us to compare and select better

incomplete test suites. The complexity of finding that measure, as well as the complexity of finding minimum complete test suites, is

identified. Furthermore, we address the reduction of testing problems to each other, that is, we study how the problem of finding test

suites to test systems of some kind can be reduced to the problem of finding test suites for another kind of systems. This enables to

export testing methods. In order to illustrate how general notions are applied to specific cases, many typical examples from the formal

testing techniques domain are presented.

Index Terms—Formal testing techniques, general testing frameworks

Ç

1 INTRODUCTION

TESTING consists in checking the correctness of a systemby interacting with it. Typically, the goal of this interac-

tion is checking whether an implementation fulfills a prop-erty or a specification. If the specification is formallydefined then procedures for deriving tests, applying tests,and assessing the outputs collected by tests can be formaland systematic [5], [13], [23], [27], [29]. There exist myriadsof formal testing methodologies, each one focusing onchecking the correctness of a different kind of system (e.g.labeled transition systems [5], [14], [33], [34], temporal systems[3], [22], [26], [31], probabilistic systems [24], [32], Java pro-grams [6], [12], etc). Some methods focus on testing a part ofthe behavior considered critical. Other methods aim at con-structing complete test suites, that is, sets of tests such that,after applying them to the implementation, the results allowus to precisely determine whether the implementation iscorrect or not with respect to the specification. For example,checking the correctness of a non-deterministic machinecould be impossible regardless of how many tests oneapplies, because a given behavior could remain hidden forany arbitrarily long time. Even if a machine is deterministic,we could need to apply infinite tests if the number of avail-able ways to interact with the implementation is infinite. Insome cases where infinite tests are required, it could be thecase that we can achieve any arbitrarily high degree of par-tial completeness with some finite test suite, thus enabling akind of unboundedly-approachable completeness, rather thancompleteness. Alternatively, if the behavior of the system

depends on temporal conditions and the time is assumed tobe continuous, then we could need to check what happenswhen a given input is produced at all possible times, thusrequiring an uncountable infinite set of tests.

Since the feasibility of a testing method depends on ourcapability to test systems in finite time, investigating theconditions that enable/disable the existence of finite com-plete test suites in each case is a major issue in testing. Inthis regard, several works have studied the effect of assum-ing some hypotheses about the implementation [18], [19],[29]. By assuming hypotheses, the number of systems thatcould actually be the implementation is reduced, so less testsmust be applied to seek for undesirable behaviors. Actually,this may yield a reduction in the number of necessary testsfor completeness, from infinite to finite, in some specificcases [2], [15].

However, the general conditions that make the differencebetween requiring infinite or finite sets of tests (regardlessof the use of hypotheses) are still unknown. In this paper,by testability we will understand how difficult is to test sys-tems, measured in terms of the size required by test suitesto be complete. We think that the conditions leading to thetestability cases commented on before must be analyzed,that is, whether complete test suites are:

a) finite;b) infinite, but any arbitrarily high degree of partial

completeness can be finitely achieved;c) countable infinite;d) uncountable infinite;e) or there does not exist any complete test suite at all.To the best of our knowledge, testability scenarios (b),

(d), and (e) have never been defined or investigated. A fewhypotheses enabling finite complete test suites for some par-ticular models, such as finite state machines, have beenreported [5], [23], [27], [29], and some approaches to for-mally define cases (a) and (c) have been proposed [7], [8].

� The authors are with the Department of Sistemas Inform�aticos yComputaci�on, Universidad Complutense de Madrid, Madrid 28040,Spain. E-mail: {isrodrig, llana}@sip.ucm.es, [email protected].

Manuscript received 30 Jan. 2013; revised 31 May 2014; accepted 11 June2014. Date of publication 17 June 2014; date of current version 18 Sept. 2014.Recommended for acceptance by P. Inverardi.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TSE.2014.2331690

862 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 9, SEPTEMBER 2014

0098-5589� 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: A General Testability Theory:

However, as we will comment later in the related work sec-tion, they lack the generality to represent many feasible test-ing scenarios. Moreover, little is known about theconditions that must hold to make a testing problem fall ineach of these cases, as well as the borders between them.This contrasts with other mature fields of computer sciencelike Computability or Complexity, where a well establishedtheory allows us to relate different known problems to eachother, as well as to classify them into a known hierarchy.Unfortunately, the lack of such formal roots makes the fieldof Formal Testing Techniques a bit disorganized, and techni-ques are not easily inherited from one problem to another–even if they are quite similar after abstracting factors notdirectly affecting testability.

In this paper we propose a first step towards the con-struction of a general testability theory. We will presentsome formal criteria to classify testing problems accordingto their testability. Providing a complete testability hierar-chy is a huge task and it is out of the objectives of this paper.Instead, we will introduce a hierarchy including only thefive main testability cases commented before. We will alsopresent some formal properties of the testability scenario (a)(later called finitely testable class) such as conditions requiredfor finite testability, alternative characterizations, transfor-mations keeping the testability, the effect of adding testinghypotheses, methods to reduce a testing problem intoanother, the complexity of finding a minimum complete testsuite, and the complexity of measuring the completenessdegree of incomplete test suites. We will apply these proper-ties to study the testability of some examples. A deeperstudy of the properties of the remaining four testability clas-ses remains for future work.

1.1 Related Work

In this section we present some previous proposals and weuse such a presentation to summarize, by the way, the maindifferences of them with our proposal. As we said before,there is little work dealing with testing from an abstract andgeneral point of view. Among the literature on this subject,it is worth mentioning [7], [8], [10], [25], [30], [35].

In [7], Cherniavsky and Smith propose a recursion theo-retic approach to testing, and apply concepts from inductiveinference by relating program testing and inductive infer-ence. An abstract formalism to represent testing is pre-sented. Authors show that, for recursively enumerable setsof functions, inference is more difficult than testing, andstudy the relationship between testable sets and recursivelyenumerable sets. The proposed model assumes that pro-grams receive a natural number as input and produceanother natural number as output. An adequacy criterion(for us, testing completeness) is defined as follows: Outputscollected by a finite test suite must allow the tester touniquely determine which model is the implementation undertest (IUT), given a set of models that could be the IUT. Onthe contrary, our own notion of completeness will impose aweaker requirement: Given a set of models (functions) rep-resenting all possible definitions of the IUT, and a subsetrepresenting those that are considered as correct, a test suiteis complete if outputs after applying it necessarily let us pre-cisely determine whether the IUT belongs to the latter set ornot. Thus, determining which one is the specific model of the

IUT is irrelevant after we have determined on which side ofthe correctness/incorrectness border the IUT is. Hence, ourinterpretation of the testing case (a), as proposed before inthe introduction, is different. Note that considering a set ofcorrect definitions of the IUT, rather than a single correctdefinition, is natural in many cases: specifications canbe partially unspecified and, moreover, sometimes accept-able implementations may be mutually exclusive or denotebehaviors which are incompatible with other acceptableimplementations.1 Besides, only deterministic and totalfunctions are considered in [7], whereas our models explic-itly denote non-determinism (functions may return differentnon-deterministic outputs, and our completeness criterionrequires that the correctness can be determined regardlessof which outputs are non-deterministically chosen by theIUT) as well as non-termination (e.g. the tester can determinethat emitting a and next non-terminating could be undistin-guishable from emitting a and next terminating; more gen-erally, our model allows testers to explicitly determinewhich pairs of outputs can/cannot be distinguished).Regarding the distinction of cases (a)-(e) commented before,in [7] only case (a) is defined (with the previous importantdifferences), and properties given in [7] just focus on com-paring testing and inference.

Based on [7], Cherniavsky and Statman developed latera game theoretic approach to testing, as well as a theoryof testing in the limit, in [8]. Let us note that the proposednotion of testing in the limit is not related to case (b) com-mented before (i.e. unboundedly-approachable testing),but it is strongly related to case (c) (i.e. countable infinitetesting). Thus, in [8] the word limit refers to the fact that,if we were able to apply all test cases belonging to an infi-nite countable set, then the completeness would beachieved. On the contrary, our notion of unboundedly-approachable testing (case (b)) will require that any non-full completeness measure can be achieved with some finitetest suite, which will be a subtle intermediate pointbetween cases (a) and (c) (in fact, stronger than (c) andweaker than (a)).

In [10], Davis and Weyuker outline a theoretical modelfor the notion of test adequacy for a given program. Con-trarily to our proposal, which will be black-box oriented,a white-box testing approach is considered. A test set issaid to be adequate for a given program if the input-outputbehavior of the program for the given test set distin-guishes the program from the rest of the programs whoseinput-output behavior is not identical to the first one.This idea is generalized by providing a nearness definitionas a measure of the distance between programs. Theirmodel is presented as a programming language thatreceives one input and generates one output, and a gram-mar definition is given to measure this distance. Authors

1. Let us consider a drinks machine that sells orange and lemon jui-ces. It has two buttons, A and B. The machine will be considered correctas long as buttons unambiguously allow users to request orange orlemon drinks. Thus, an implementation where button A always returnsorange and B always returns lemon is correct. Moreover, an implemen-tation where A always returns lemon and B always returns orangewould be correct as well. Any other behavior is not permitted. Bothbehaviors are incompatible, so considering them as two separateacceptable definitions is a natural choice.

RODR�IGUEZ ET AL.: A GENERAL TESTABILITY THEORY: CLASSES, PROPERTIES, COMPLEXITY, AND TESTING REDUCTIONS 863

Page 3: A General Testability Theory:

also discuss a hierarchy of adequacy notions, based onthe distance between two programs, and define the mini-mal adequation of a set for a program.

Another paper dealing with the complexity of testing dif-ferent classes of programs in white-box testing is [30], whereRomanik and Vitter define a measure of testing complexitycalled VCP-dimension. In this work, authors give upperand lower bounds on the number of test points needed todetermine if a program is approximately correct or to distin-guish one program from all other programs in the sameclass. These test points are given as input/output pairs.This measure is applied to different classes of programs:straight line programs, if-then-else statements, for loops,and the combinations of the previous schemes.

In [35], Zhu and He present a methodology for testinghigh-level Petri nets in order to test concurrent softwaresystems. The syntax and semantics of the model presentedare introduced, and next the model is used to study fourdifferent types of testing strategies for Petri nets: transition-oriented, state-oriented, flow-oriented, and specification-oriented. Each method is formally defined by an observa-tion scheme and an adequacy criterion. In particular, thetest adequacy is defined in terms of the coverage for theconsidered criterion.

In [25], Meinke presents a mathematical framework forblack-box testing. The model defines a functional require-ment as a precondition on the input data and a postcondi-tion on the output data of a system. This approach allowstesters to treat coverage, test termination and reliabilitymodeling, as well as test case generation. A program is saidto fail a test if the test satisfies the precondition, the programterminates on the input, and the resulting output assign-ment does not satisfy the postcondition. In this way, Meinkedefines program testing as the search for counterexamplesof program correctness, and introduces a probabilisticmodel to compute the probability that a specification fails atest after passing n tests. In this way, the probability of cor-rectness of a program is calculated, which can be used togenerate good test cases to find out errors.

1.2 Contributions

Next we summarize the main contributions of this work.These notions will be explained in detail in subsequentsections.

� A general framework to define testing scenarios isgiven. It provides enough generality to enable therepresentation of many different testing scenarios, aswe will illustrate with many typical examples fromthe formal testing techniques domain. Our effortswill primarily focus on tackling the most challengingpart of testing, that is, guaranteeing that an incorrectIUT will necessarily emit outputs that let us distin-guish it from any possible correct definition (whichprovides exhaustiveness), though our completenessnotion will not only require this, but also that a cor-rect IUT will emit outputs that let us distinguish itfrom any possible incorrect definition (which pro-vides soundness). In comparison to other generalframeworks, ours has the following features:

1) In order to provide generality, no specific struc-ture of systems (states, transitions, code lines,etc) is assumed. Behaviors are represented byfunctions relating inputs and outputs, in a black-box manner.

2) Designers can explicitly denote that there areseveral correct definitions rather than a singleone, which is a natural approach to under-speci-fication and mutually exclusive valid definitions.More importantly, the goal of testing is deter-mining whether the IUT belongs to the set of cor-rect definitions, which is a weaker requirementthan uniquely identifying the IUT from a set ofpossible definitions (as typically considered inthe literature).

3) Testers can explicitly specify which pairs of out-puts are (un-)distinguishable through observa-tion. It enables a natural representation ofobservation issues such as non-termination andimprecise observation of magnitudes.

4) The non-determinism of systems is explicitlydenoted. Moreover, the non-determinism explic-itly affects the definition of completeness: A testsuite is considered complete only if, after apply-ing it to the IUT, we will always be able to decidewhether the IUT is correct or not (regardless ofnon-deterministic choices actually made by theIUT).

� A classification of testing problems into five classes isprovided. This classification includes classes whichhave been introduced before in the literature but hada different definition (cases (a) and (c)), as well asother classes which have never been proposed to thebest of our knowledge, including the class whereunboundedly-approachable testing is possible (case (b)).

� Properties of the first class (case (a)) are extensivelystudied, including alternative characterizations andtechniques to export methods between testingproblems.

� The particular case where the set of possible defini-tions of the IUT is finite is analyzed. The complexityof problems such as finding the minimum completetest suite, and finding the least hypothesis thatmakes test suites complete in several scenarios, isstudied. Most of them (but not all) turn out to be NP-complete problems.

� In order to put the proposed theoretical ideas intopractice, we develop a few elaborated case studieswhere our framework lets us discover new interestingproperties about well-known testing settings (e.g. theinclusion of a part of M. Hennessy’s testing frame-work [17] into case (b)) and other useful models (e.g.the classification of testing time-out machines withrational times into the same case (b), and the classifica-tion of testing several variants of machines handlingincreasing continuous magnitudes in case (a), which isdiscovered through a series of testing reductions).

1.3 Paper Structure

This paper is structured as follows. In the next section weconstruct some basic concepts to reason about testability,

864 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 9, SEPTEMBER 2014

Page 4: A General Testability Theory:

and in Section 3 we present the classes of our testabilityhierarchy. Next, in Section 4 we present some propertiesof the first of these classes, the finite testability class. Wefocus on studying those aspects that enable/disable finitetestability, the complexity of some problems, the effect ofassuming testing hypotheses, and the notion of testingreduction. Later, Section 5 presents our case studies. Forthe sake of readability, the lengthiest proofs (in particular,those concerning results about complexity) are presentedin a different section, Section 6. In Section 7 we presentthe conclusions of the paper and we outline some lines offuture work.

2 TESTABILITY CONCEPTS

In this section we introduce some preliminary concepts anddefine testability classes. A general notion to denote imple-mentations and specifications is presented. In particular, weintroduce an abstraction of computation device (program,hardware system, etc.) as a machine that takes an inputfrom a set of inputs and computes an output. Let us recallthat testing consists in observing the outputs of an imple-mentation and comparing them with those allowed by thespecification. However, there might be different outputsthat are not distinguishable under observation, for instance inpresence of imprecise measurement or non-termination (seethe forthcoming Example 1). Therefore, we need to intro-duce a notion of distinguishable outputs.

Definition 1. Let O be a set of outputs.

� A distinguishing relation for O is an anti-reflexivesymmetric binary relation 6$ over O. We denote thecomplementary of 6$ by $ .

� A set of distinguished outputs is a set O of outputswith a distinguishing relation 6$ .

� We extend the relation 6$ to sets of outputs as follows.Let O0;O00 � O. We say that O0 and O00 are distin-

guishable, denoted by O0 6$ O00, iff o0 6$ o00 for allo0 2 O0 and o00 2 O00. Again, the relation $ is the nega-tion of 6$ .

Hereafter, when no distinguishing relation is explicitlygiven, we will assume the trivial inequality: o1 6$ o2 iff o1 6¼ o2.The next example illustrates the previous definition.

Example 1. Let us consider a testing scenario where all out-puts can be distinguished from each other. Then we mayconsider a trivial distinguishing relation 6$ where twooutputs o1; o2 2 O are distinguishable iff they are differ-ent, i.e. o1 6$ o2 iff o1 6¼ o2.

This strong distinguishing capability may not be feasi-ble in other frameworks. Let us observe programs whichmay not terminate, and let us assume that the time atwhich messages are produced is not represented in col-lected observations. For instance, the outputo1 ¼ mes1 �mes2 � stop denotes that the IUT produces amessage mes1, next produces mes2, and next terminates(stop), whereas the output o2 ¼ mes1 � ? denotes that theIUT produces mes1 and next is stuck forever, denoted by?. In practice, o1 is not effectively distinguishable fromo2 via observation because, if mes1 is produced, then wecannot guarantee that mes2 will not be observed later, no

matter how much time passes. Consequently, we mayconsider the following (effective) distinguishing relation:We have o0 6$ o00 iff (i) o0 6¼ o00; and (ii) neither o0 ¼ w � ?nor o00 ¼ w � ?, where w is the longest common prefix ofo0 and o00. Note that if (ii) is not met then o1 and o2 are notdistinguishable in finite time, so o1 $ o2. In fact, giventhis definition of 6$ , if we observe o 2 fo0; o00g and wehave o0 6$ o00 then we can decide whether o ¼ o0 or o ¼ o00

in finite time.2 The tester could also represent deadlocksand livelocks with different symbols if required (e.g. ?1

and ?2), and define them as distinguishable or notdepending on whether e.g. the tester can monitor con-sumed processing resources or not.

Now let us assume that the time when each message isproduced is denoted in outputs indeed. For instance, theoutput hð1:2;mes1Þ; ð2:7;mes2Þ; ð3:2; stopÞi denotes thatmes1 is produced at time 1:2, then mes2 is produced attime 2:7, and next the system stops at time 3:2. In thiscase, non-termination issues would not affect the distin-guishability of outputs. Let us consider o ¼ hðt1; o1Þ; . . . ;ðtn; onÞi 2 O. If time ti is reached and oi is not observedthen we know for sure that we are not observing o. Thus,in this case we may use the following criterion: We haveo0 6$ o00 iff o0 6¼ o00.

We could also consider a scenario where two outputscannot be distinguished if outputs have close, but notidentical time stamps. Thus, two observations o ¼ hðt1;o1Þ; . . . ; ðtn; onÞi 2 O and o0 ¼ hðt01; o01Þ; . . . ; ðt0m; o0mÞi 2 O

would be distinguishable, i.e. o0 6$ o, iff n 6¼ m or, forsome i, oi 6¼ o0i or jti � t0ij > � for some given �.

Finally, let us note that the non-distinguishing relation$ may be non-transitive. Let us suppose that testerscheck an analog scale, but their observations may deviate�3 gr from the weight actually indicated by the sensor.Then, we could define our distinguishing relation insuch a way that 45 gr $ 47 gr (testers cannot assure that45 gr and 47 gr observations denote different IUT behav-iors indeed) and 47 gr $ 49 gr, but 45 gr 6$ 49 gr. Simi-larly, the relation $ given in the previous paragraph isnot transitive either.

Next we introduce our notion of computation formalism.

Definition 2. Let I be a set of inputs and O be a set of distin-guished outputs. A computation formalism C for I and O is

a set of functions f : I ! 2O where for all i 2 I we havefðiÞ 6¼ ? .Given a function f 2 C, fðiÞ represents the set of out-

puts we can obtain after applying input i 2 I to the com-putation artifact represented by f . Since fðiÞ is a set, fmay represent a non-deterministic behavior. Besides, C,I and O can be infinite sets. Representing the behaviorof systems by means of functions avoids to implicitlyimpose a specific structure to system models (e.g. states,transitions). Still, elaborated behaviors can be repre-sented. The next example shows different computationformalisms.

2. Alternatively, a tester could assume a kind of non-terminationobservability for practical reasons. For instance, the tester could assumenon-termination if he observes that some timeout is reached. In thatalternative case, he could consider o0 6$ o00 iff o0 6¼ o00.

RODR�IGUEZ ET AL.: A GENERAL TESTABILITY THEORY: CLASSES, PROPERTIES, COMPLEXITY, AND TESTING REDUCTIONS 865

Page 5: A General Testability Theory:

Example 2. Let C be a computation formalism representingthe set of all (possibly non-deterministic) Mealymachines (also known as finite state machines, FSMs). LetM be an FSM and I 0 and O0 be the set of inputs and theset of outputs of M, respectively. M is represented in Cby a function f 2 C such that we have fðsÞ ¼ fs0

1;

. . . ; s0ng if and only if fs0

1; . . . ; s0ng is the set of sequences

of O0 outputs that can be answered byM when it receivesthe sequence s of I 0 inputs. For instance, if s ¼ a � b,s0 ¼ x � y, and s00 ¼ w � z, then fðsÞ ¼ fs0; s00g means thatf represents an FSM M where, in particular, if a � b isgiven then the machine can answer either x � y or w � z.Hence, the set I considered in Definition 2 (respectively,the set O) is the set of all sequences of symbols belongingto I 0 (resp. to O0), that is, I ¼ I 0� and O ¼ O0�. Thus, evenif I 0 and O0 are finite, I and O are infinite sets. Alterna-tively, if systems were assumed to be deterministic FSMs,then all functions representing a non-deterministic FSMwould be removed from C. Other restrictions over theset of systems that are represented by C could be consid-ered (e.g. considering only FSMs with less than n statesfor some given n 2 N, etc).

Now, let us consider a computation formalism C thatrepresents interactive programs written in a given pro-gramming language. The interaction of a user with theprogram can be represented in a similar way as in Exam-ple 1. For example, the behavior of a given program Pcould be denoted by a function f such that, in particular:

fhð0:5; but1Þ;ð1:25; but2Þi

� �¼

hð0:75; mes1Þ; ð1:5;mes2Þ; ð2; stopÞi;hð0:75;mes1Þ; ð1:5;mes3Þ; ð2;mes4Þ;ð2:5; stopÞi;

hð0:75;mes1Þ;?i

8>><>>:

9>>=>>;

meaning that if the user presses button 1 at time 0:5 andbutton 2 at time 1:25, then she will receive message 1 attime 0:75 for sure, and next the program non-determin-istically chooses one of the following choices: (a)answering message 2 at time 1:5 and stopping at time 2;(b) answering message 3 at time 1:5, giving message 4 attime 2, and then finishing at time 2:5; or (c) not termi-nating (denoted by the ? symbol; the effect of non-ter-minating behaviors on testing will be further discussedlater). For example, we have hð0:5; but1Þ; ð1:25; but2Þi 2 Iand hð0:75;mes1Þ; ð1:5;mes2Þ; ð2; stopÞi 2 O. Alterna-tively, if temporal issues were not considered relevantfor assessing the correctness of behaviors, then timestamps could be removed from the proposed represen-tation, or they could be replaced by ordering stampsdenoting the relative order of each O0 output withrespect to I 0 inputs. If other factors were considered rel-evant, additional details could be denoted.

If two FSMs (resp., two programs) produce the samesets of outputs for all inputs then both machines are rep-resented by the same function f 2 C. This is because afunction belonging to C represents a relation betweeninputs and outputs (but not the internal structure of themachine leading to this behavior).

Computation formalisms will be used to represent the setof implementations we are considering in a given testingscenario. Implicitly, a computation formalism C representsa fault model (i.e. the definition of what can be wrong in theimplementation under test) as well as the hypotheses about theIUT the tester is assuming. For instance, if the IUT isassumed to be a deterministic FSM and to differ from agiven correct FSM in at most one transition, then only func-tions denoting the behaviors of such FSMs (including thecorrect one) are in C. Alternatively, if we only assume thatthe IUT is represented by a deterministic FSM, then C willrepresent all deterministic FSMs.

Computation formalisms will also be used to representthe subset of specification-compliant implementations. LetC represent the set of possible implementations and E � Crepresent the set of implementations fulfilling the specifi-cation. The goal of testing is interacting with the IUT sothat, according to the collected responses, we can decidewhether the IUT actually belongs to E or not. Typically,we apply some tests (i.e., some inputs i1; i2; . . . 2 I) to theIUT so that observed results o1 2 fði1Þ, o2 2 fði2Þ, . . . allowus to provide a verdict. Since all outputs are returned bythe same function f , we are implicitly assuming that allinput applications are independent, i.e. the result afterapplying i1 does not affect the output observed next, whenwe apply i2.

3 Note that dependencies between consecutiveinput applications can still be represented by usingsequences, as in Example 2. For instance, let i1 and i2 beinputs such that, if we apply input i1 before i2, then weobtain an output o1; however, if we apply only input i2,then we obtain a different output o2. In this case, the inputsof the computation formalism would not be system inputsbut sequences of system inputs, and the system would berepresented by a function f such that fði1 � i2Þ ¼ fo1g andfði2Þ ¼ fo2g.Definition 3. Let C be a computation formalism. A specification

of C is a set E�C.If f 2 E then f denotes a correct behavior, whereas

f 2 CnE denotes that f is incorrect. Thus, a specificationimplicitly denotes a correctness criterion. For example, letf; f 0 2 C be two functions such that for all i we havefðiÞ ¼ fag and f 0ðiÞ ¼ fbg. Then, E ¼ ff; f 0g denotes thatonly machines producing always a or always b are consideredcorrect. We can also construct E in such a way that a givenunderlying semantic relation is considered (e.g. bisimula-tion, testing preorder, traces inclusion, conformance testing,etc). For instance, given some f 2 C, let us consider that f iscorrect and we wish to be consistent with respect to a givensemantic relation �, where A � B means that A is correctwith respect to B. Then we could define E as follows:E ¼ ff 0jf 0 � f ^ f 0 2 Cg.

Next we identify test suites and complete test suites.By applying tests cases in a test suite (i.e. a set of inputsI � I) to an IUT, collected outputs should (ideally) letus determine whether the IUT is correct or not, i.e.

3. In practice, this is equivalent to assuming the existence of a reliablereset button, a typical testing assumption. Since a reset button allows torecover the initial state after each test application, it allows to reasonindependently about each test application in a sequence of testsapplications.

866 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 9, SEPTEMBER 2014

Page 6: A General Testability Theory:

whether the function denoting its behavior belongs to agiven specification set E � C. Unfortunately, in black-box testing we do not have access to the IUT definition,so the only thing we can do is to observe the outputsgiven by the IUT when tests are applied, and collect pro-duced outputs. If observed outputs can be produced bysome possible correct behavior (i.e. some f 2 E) but can-not be produced by some possible incorrect behavior (i.e.some f 2 CnE), then we know for sure that the IUT iscorrect; otherwise, we cannot assure the IUT correctness.If a test suite I � I is such that observed outputs willnecessarily let us decide the (in-)correctness of the IUT,then we say that the test suite is complete. This requiresthat, for any pair of correct and incorrect functions (i.e.f 2 E and f 0 2 CnE, respectively), there must be a testi 2 I that distinguishes f and f 0. Let f 00 be the actualbehavior of the IUT. When tests i1; i2; . . . 2 I are appliedto f 00, some outputs o1; o2; . . . 2 O are (in general non-deterministically) chosen by f 00 and observed(o1 2 f 00ði1Þ; o2 2 f 00ði2Þ; . . .). If I is complete then, for eachpair f 2 E and f 0 2 CnE of possible correct and incorrectbehaviors of the IUT, some of these outputs, say oj, mustlet us distinguish between f and f 0. Assuring this prop-erty a priori, i.e. regardless of non-deterministic choicesactually taken by the IUT, requires that, for some ij, allpossible non-deterministic output responses of f must bepairwise distinguishable from all possible non-determin-istic output responses of f 0. In this case, outputs col-lected by applying I to the IUT will let us determine the(in-)correctness of the IUT in any case.

Let us note that, as explained before in Section 1.1,our completeness notion differs from several classicaladequacy notions given in the literature, where non-deter-minism is not taken into account and, more importantly,a quite stronger condition is required: rather than deter-mining whether the IUT behavior belongs to E or not, itis required that we are able to distinguish any candidatebehavior from any other candidate behavior from agiven set. Note that, given f1; f2 2 E (respectively,f1; f2 2 CnE), our notion does not require that f1 and f2are distinguishable, so our requirement is weaker.Besides, defining specific distinguishing relations 6$ ena-bles subtler relations between observable outputs.

Definition 4. Let C be a computation formalism for I and O,E � C be a specification, and I � I be a set of inputs.

We say that f 2 E and f 0 2 CnE are distinguished by I ,denoted by di f; f0; Ið Þ, if there exists i 2 I such thatfðiÞ 6$ f 0ðiÞ.

We say that I is a complete test suite for C, E if for allf 2 E and f 0 2 CnE we have di f; f0; Ið Þ.Note that, in the previous definition, 6$ is applied to sets

of outputs, so all outputs from a set are required to be dis-tinguishable from all outputs from the other set.

Observations collected by complete test suites preciselydetermine whether the IUT is correct or not. Let I be a com-plete test suite for the computational formalism C and thespecification E. Then, we trivially conclude that:

A) if f 2 E, then there is no f 0 2 CnE such thatf 0ðiÞ $ fðiÞ for all i 2 I , and

B) reciprocally, if f 2 CnE then there does not existf 0 2 E such that f 0ðiÞ $ fðiÞ for all i 2 I .

That is, after applying I , no correct IUT can be consideredas incorrect, and no incorrect IUT can be considered ascorrect.

3 TESTABILITY HIERARCHY

Once the basic general framework has been presented, weintroduce four of the five classes of our testability hierarchy:Class I, Class III, Class IV, and Class V (Class II will bedefined later).

Definition 5. Let C be a computation formalism for I and O, andE � C be a specification.

We say that a pair ðC;EÞ is finitely testable if there existsa finite complete test suite for C, E. We denote the set of allfinitely testable pairs ðC;EÞ by Class I.

We say that ðC;EÞ is countably testable if there exists acountable complete test suite for C, E. We denote the set of allcountably testable pairs ðC;EÞ by Class III.

We say that ðC;EÞ is testable if there exists a complete testsuite for C, E. We denote the set of all testable pairs ðC;EÞ byClass IV.

We denote the set of all pairs ðC;EÞ by Class V.This classification induces the relation Class I�

Class III�Class IV�Class V. Class II, whose definitionis harder, will be presented in Section 3.2, Definition 7. Nextwe show some examples fitting into each of these testabilityclasses. They show that all the inclusions considered in theprevious relation are proper indeed. Besides, the proposedexample of a pair ðC;EÞ in Class IV but not in Class IIIwillbe used to argue that the interest of Class IV is, perhaps, justtheoretical.

Example 3. Given n 2 N, let C1 represent the set of all deter-ministic completely-specified FSMs with at most n stateswhere the finite sets of inputs and outputs are I 0 and O0,respectively. Let E1 � C1, and I 1 be the set of all sequen-ces of I 0 symbols whose length is at most 2nþ 1. It isknown that, if two FSMs M1 and M2 represented by C1

produce different responses for some input sequence,then sequences answered by M1 and M2 are different forat least one sequence belonging to I1 [9], [23]. Hence, forall f 2 E1 and f 0 2 C1nE1 there exists a sequence i 2 I 1

allowing to distinguish f and f 0. Thus,ðC1; E1Þ 2 Class I. As we will see in Example 6 (used toillustrate notions of Section 4) this result can also beproved by applying Lemma 1 (c), which makes this resultstraightforward and avoids the necessity of identifyingany specific complete test suite I 1.

We consider again the previous case, but this time weremove the restriction that FSMs have at most n states.Let C2 be the resulting computation formalism, and letE2 � C2. It is known that, in general, given two (possibleinfinite) sets of deterministic FSMs, there is no finite setof sequences I 2 allowing to distinguish each member ofthe first set from each FSM in the other set. Hence, in gen-eral we have ðC2; E2Þ 62 Class I (in Example 6 we alsoprove this property, this time by using forthcomingLemma 1 (b)). However, the set of all sequences of inputsfrom I 0, that is I 0�, distinguishes all pairs of non-

RODR�IGUEZ ET AL.: A GENERAL TESTABILITY THEORY: CLASSES, PROPERTIES, COMPLEXITY, AND TESTING REDUCTIONS 867

Page 7: A General Testability Theory:

equivalent deterministic FSMs. In fact, I 0� can be num-bered (e.g. lexicographically). Thus, we haveðC2; E2Þ 2 Class III.

Let us consider a variant of deterministic FSMs wherethe transition taken at each situation depends on theelapsed time, and let the time be assumed to be continu-ous. For each state and input, a machine may have sev-eral outgoing transitions, each one labeled by a timeinterval ½t1; t2 with t1; t2 2 R such that t1 t2. When aninput i is received by the machine, the machine takes thefirst defined transition reacting to i such that the elapsedtime fits into its interval. We denote by C3 the proposedcomputation formalism, and we consider E3 � C3. Theset of all input sequences produced at all possible realtimes is a complete test suite for ðC3; E3Þ, soðC3; E3Þ 2 Class IV. However, in general we haveðC3; E3Þ 62 Class III. In order to see this, let us note thatthe IUT could have some faulty transitions with intervalsof the form ½t; twith t 2 R, and detecting such transitionsrequires considering all input sequences in all real times.

Now, let C03 be defined as C3, but only intervals fol-

lowing the form ½t1; t2, with t1 < t2 and t1; t2 2 R, areallowed in machines represented by C0

3. Let E03 � C0

3. Inthis case, the set of all input sequences produced at allpossible rational times is complete for C0

3, E03: For all

t1; t2 2 R with t1 < t2 there exists a rational t witht1 < t < t2, so by emitting all input sequences in all possi-ble rational times we will be able to observe all IUT tran-sitions. Since the set of all rational numbers is countableand the set of all finite input sequences is so, the set of allfinite sequences of rational-timed inputs is countable aswell (note that if A is countable then A� is countable, andif B is countable too then A�B is countable as well).Hence, we have ðC0

3; E03Þ 2 Class III.4

We could argue that the problematic intervals of C3,i.e. those of the form ½t; twith t 2 R, are artificial becausewe cannot produce an input at the (exact) time t. As faras we have analyzed other examples in Class IV but notin Class III, they are so for similar reasons. Thus, thepractical utility of Class IV, if it has any, has not beenproved yet. However, we include Class IV in our hierar-chy for the sake of theory completeness.

Let C4 be a computation formalism representing allnon-deterministic (non-timed) FSMs, and let E4 � C4.We consider an FSM M1 that answers b when a isreceived, and another FSM M2 which behaves in thesame way as M1 for any input different from a, andbehaves non-deterministically for a: If M2 receives a,then M2 can answer either b or c. Let us suppose that M1

is correct and M2 is not, and let they be represented in C4

by f 2 E4 and f 0 2 C4nE4, respectively. Despite the factthat we could obtain c after applying a to f 0, input a doesnot necessarily distinguish f and f 0 because both of them

could produce b in response to a. In fact, M2 could hidethe output c for any arbitrarily long time, no matter howmany times we apply a. Thus, no input allows the testerto necessarily distinguish f and f 0, so we haveðC4; E4Þ 2 Class V but ðC4; E4Þ 62 Class IV.5

Non-determinism does not imply that finite testabilityis not possible. Let us consider C5 ¼ ff; f 0g and E5 ¼ ffgbe such that fðaÞ ¼ fxg, f 0ðaÞ ¼ fx; yg, fðbÞ ¼ fx; yg, andf 0ðbÞ ¼ fz; wg. Then, fbg is a complete test suite for C5

and E5. Thus, ðC5; E5Þ 2 Class I.Finally let us note that ðC;EÞ 2 Class I does not

imply that C or E are finite, or even that C and E repre-sent some simple computation formalisms such as e.g.FSMs. Let C represent all deterministic terminating func-tions from integers to integers, written in some program-ming language, that are either strictly increasing orstrictly decreasing. Let E � C consist of all strictlyincreasing functions in C. Though C and E are infinitesets and FSMs cannot express all functions in C or allfunctions in E, f3; 5g is a complete test suite for ðC;EÞ:For all f 2 C, fð3Þ < fð5Þ iff f 2 E. So, ðC;EÞ 2 Class I.This trivial example shows that the finite testability doesnot lie in the finiteness or the simplicity of C or E, but inthe simplicity of the border between correctness andincorrectness.

If ðC;EÞ 2 Class I then we can precisely decide if theIUT is correct or not (i.e., if it belongs to the specificationset) by considering the answers collected by a complete testsuite. On the contrary, if the problem belongs to Class III

or Class IV, but not to Class I, then applying a completetest suite to the IUT is unfeasible because the suite is infinite.In particular, if the problem belongs to Class III then wecould rather speak about testability in the limit, i.e. as weiteratively apply more tests, we could tend to complete test-ing coverage (later this idea will be further elaborated todefine Class II, given in Definition 7). This is not the casefor problems in Class IV but not in Class III: Applying aninfinite number of tests one after each other in a given orderis not enough to reach complete coverage, because the com-plete test suite is not countable.

3.1 Non-Termination

Next we discuss the relation between complete testsuites and non-termination. Let us consider a computa-tion formalism C where systems may not terminate, andlet I be a finite complete test suite. Each correct functionis distinguished from each incorrect function for somei 2 I (and vice versa). Let us suppose that the distin-guishing relation 6$ is defined in such a way that twooutputs are distinguished only if we can distinguishthem in finite time (e.g. as we did in Example 1, secondparagraph). Let f 2 CnE (the argument if f 2 E is sym-metric). If we apply all inputs in I to f then, for some ofthese inputs, we will observe some outputs allowing todistinguish f from all correct functions. By the definition

4. Besides, there are in the literature some specific temporal systemsdealing with continuous time where testing completeness can befinitely achieved. This is the case for Timed Automata under the assump-tions that the number of states is known, and upper and lower boundsof all temporal intervals are required to be integers [31]. Then, the timecan be discretized into a finite set of time regions, and completenesscan be finitely achieved by considering only these regions for testingpurposes.

5. If fairnesswere assumed, we would be considering a different com-putation formalism C0

4 6¼ C4. In C04, for all single input denoting a long

repetition of the same experiment, the single output would denote thatall possible reactions are observed at least once.

868 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 9, SEPTEMBER 2014

Page 8: A General Testability Theory:

of 6$ , for all of these inputs the part of the answerneeded to make the distinction is reached in finite time.However, for those inputs of I not distinguishing func-tion f from any correct function, f could not terminate(this is the case if the specification permits not to termi-nate for some inputs). Thus, the procedure consisting inapplying all inputs of I one after each other could not ter-minate indeed.

Nevertheless, let us note that we could always returnverdicts in finite time if all inputs in I were applied to (sev-eral instances of) the IUT in parallel or interleaving: Sinceeach required distinction is provided in finite time, once alldistinctions are reached we can stop the parallel/inter-leaved execution of tests and provide a verdict. Alterna-tively, we could adopt a more conservative approach wherecomplete test suites are not allowed to include any input isuch that, for some f 2 C, fðiÞ could not terminate. In thiscase, the application of a complete test suite terminates evenif inputs are applied sequentially. The results presentedlater in Section 4 also apply to this alternative notion of com-plete test suite, once a straightforward adaptation is madein each case.

3.2 Definition of Class II

In this section we elaborate on our idea of finite testability inthe limit or, more precisely, unboundedly-approachable finitetestability. When finite testability is not possible, in somecases it may still be possible to test the IUT up to any arbi-trarily high distinguishing rate with a finite test suite. As dis-tinguishing rate we will consider the ratio between thenumber of pairs of correct/incorrect functions that are dis-tinguished by the test suite and the total number of correct/incorrect pairs. Let us note that this ratio can only be definedif the computation formalism C is finite.

Definition 6. Let C be a finite computation formalism for I andO, E � C be a specification, and I � I be a set of inputs. Wedefine the distinguishing rate of I for ðC;EÞ, denoted byd�rate I ; C; Eð Þ, as

jfðf; f 0Þjf 2 E; f 0 2 CnE; di f; f0; Ið ÞgjjEj � jCnEj

The assumption that C is finite (otherwise the previousdivision would be undefined) reduces the applicability ofthe previous notion, as many interesting computation for-malisms are infinite indeed. In order to adapt this idea to infi-nite computation formalisms, we consider sequences of finitecomputation formalisms. Let us note that, if C is a countableset, then there exists a non-decreasing chain of finite subsets:

C1 � C2 � � � � � Ck � Ckþ1 � � � �such that C ¼ S

i2NCi. Given a specification E � C, let us

consider the sub-specifications Ei ¼ E \ Ci. Since each Ci

and Ei are finite sets, we can apply Definition 6 to themindeed. Then, the class of pairs ðC;EÞ which are unbound-edly-approachable by finite testing (ClassII) consists of thosepairs such that, for some non-decreasing sequence like thosedescribed above, any arbitrarily high distinguishing rateless than 1 is reached, by some finite test suite, for almost all-computation formalisms in the sequence.

Definition 7. We say that ðC;EÞ is unboundedly-approach-able by finite testing if there is a non-decreasing sequence:

C1 � C2 � � � � � Ck � Ckþ1 � � � � with C ¼ Si2NC

i suchthat, for all � < 1, there exists a finite set of inputs I � I and

n 2 N such that, for all l � n, d�rate I ; Cl ; El� � � �, where

El ¼ Cl \E for all l 2 N.We denote by Class II the set of all pairs ðC;EÞ that are

unboundedly-approachable by finite testing.It is straightforward to see that this class is related

with the other classes as expected: Class I � Class II �Class III. Moreover, the next two examples show that bothinclusions are proper.

Example 4.Werevisit ðC2; E2Þ as defined before inExample 3.In particular, let us assume that the sets of FSM inputs andoutputs are I 0 ¼ fa; bg and O0 ¼ f0; 1g, respectively, andE2 ¼ ff1g consists of a single function f1. Next we provethat ðC2; E2Þ 2 Class II, i.e. ðC2; E2Þ is unboundedly-approachable by finite testing.

First we define a non-decreasing sequence of finitesets C0 � C1 � C2 � � � �Cn � Cnþ1 � � � � as required bythe definition of Class II. Let us introduce some nota-tion. We say that f 2 C2 is the smallest function fulfillinga given criterion Q if f fulfills Q and, for all functionf 0 2 C2nffg fulfilling Q, the smallest FSM (i.e. the onewith less states) behaving as defined by f has less statesthan the smallest FSM behaving according to f 0, or bothhave the same number of states but f is smaller accord-ing to some fix arbitrary criterium (e.g. we can sort tiedfunctions lexicographically according to their respectiveinfinite sequences fð0Þ � fð1Þ � fð00Þ � fð01Þ � fð10Þ � fð11Þ�fð000Þ � . . . ). Note that, for any set of functions, therealways exists the smallest function fulfilling Q (as long assome function fulfills Q).

We build each set Cn as follows. First, let C0 ¼ E2 ¼ff1g. Besides, for all n � 1, the set Cn is constructed asfollows. Let us consider the set of all combinations of out-put responses that a deterministic FSM could give for allinput sequences in fa; bgn. For instance, if n ¼ 2, then allpossible responses to sequences aa, ab, ba, bb consist inanswering 00, 00, 00, 00 respectively, 00, 00, 00, 01 respec-tively, ..., 11, 11, 11, 11 respectively. Note that not all com-binations of 8 bits are possible indeed. For instance, thecombination 00, 10, 00, 00 is impossible because we can-not have fðaaÞ ¼ f00g and fðabÞ ¼ f10g: It would implythat, at its first state, the FSM can answer either 0 or 1when a is provided, but the FSM must be deterministic.According to these notions, Cn is constructed as follows:For each combination of responses to fa; bgn which ispossible in a deterministic FSM, we introduce in Cn thesmallest function providing that combination if f1 doesnot provide that particular combination, else f1 is intro-duced instead; and there are no other functions in Cn.

Let us show that, for all i � 0, we have Ci � Ciþ1. Let

f 2 Ci. This implies that either f ¼ f1 or f is the smallestfunction providing some combination of responses to

fa; bgi. If f ¼ f1 then f 2 Ciþ1, because f1 also belongs to

Ciþ1. Let f 6¼ f1. Thus f is the smallest function provid-

ing some combination of responses to fa; bgi. Note that f

also provides some combination of responses to fa; bgiþ1

RODR�IGUEZ ET AL.: A GENERAL TESTABILITY THEORY: CLASSES, PROPERTIES, COMPLEXITY, AND TESTING REDUCTIONS 869

Page 9: A General Testability Theory:

and, moreover, it is also the smallest function providingthat combination. We prove it by contradiction: If thesmallest function for that combination of responses to

fa; bgiþ1 were f 0 6¼ f , then f 0 would be smaller than f , sof 0 would also be the smallest function providing the

same combination as f for fa; bgi. However, the smallest

function for that combination in fa; bgi is f indeed, so we

have a contradiction. Thus, f also belongs to Ciþ1.Let us calculate jCnj for each n � 1. We have that Cn

has as many elements as possible combinations ofresponses to fa; bgn exist in a deterministic FSM. Let usconsider a binary tree with depth n, where left arcs arelabeled by a and right arcs are labeled by b. Sequences oflabels in each branch, from the root to a leaf, representeach sequence in fa; bgn. Each combination of responsesof a deterministic FSM to all of these sequences can berepresented by assigning a value in f0; 1g to each arc in

the tree. Since there are 2nþ1 � 2 arcs in the tree, there are

22nþ1�2 possible assignments. Thus, jCnj ¼ 22

nþ1�2.We also have to prove C2 ¼

Sn2NC

n. Let f 2 C2. Letus suppose that the smallest deterministic FSM behavingas f has k states. Note that the number of functions whichcan be represented by deterministic FSMs with k or lessstates is finite, let K be this number. If f ¼ f1 then

f 2 C0. Let f 6¼ f1. By contradiction, let us suppose that

f 62 Ci for all i 2 N. This implies that there does not existi 2 N such that, for the combination of output sequences

answered by f to fa; bgi, we have both that (a) this combi-nation differs from the combination answered by f1 to

fa; bgi; and (b) f is the smallest function providing thatcombination. Since f 6¼ f1, there exists i

0 such that, for alli > i0, the combination of sequences responded by f to

fa; bgi differs from the combination given by f1 (other-wise we would have f ¼ f1 indeed). Thus, for all i > i0

(b) is not met, i.e. f is not the smallest function providing

its own combination of responses to fa; bgi. For each

i > i0, let fi be the smallest function providing the combi-

nation of responses given by f to fa; bgi. Since we are

assuming f 62 Ci for all i 2 N, we have fi 6¼ f for alli > i0. Note that, for all i > i0, there must exist j > i such

that fi 6¼ fj (otherwise f and fi would have the same

behavior for all fa; bgj with j 2 N, so we would have

f ¼ fi). Let j1 < j2 < . . . be an infinite sequence of natu-

rals such that fjh 6¼ fjhþ1 for all h. Since there exist atmost K functions which are smaller than f , we have that

fjKþ1 cannot be smaller than f . Thus we have a contradic-

tion, and we infer f 2 CjKþ1 .Let us prove ðC2; E2Þ 2 Class II. Hereafter we will

assume sn ¼ jCnj. Let " < 1. We find a test suite I and anatural n 2 N such that the distinguishing rate of I is

higher than or equal to " for all Cl with l � n. For all

k 2 N, let Ik ¼ fa; bgk. Besides, let us consider the equiva-lence relation f k f

0 iff fðsÞ ¼ f 0ðsÞ for all s 2 Ik. For

any l > k, the relation k is a equivalence relation in Cl.The class of equivalence of a function f (that is, the set offunctions f 0 such that f k f

0) will be denoted by ½f. Note

that there are exactly sk equivalence classes in Cl, andeach class has sl

skelements. Moreover, since E2 ¼ ff1g, we

have di f1; f; Ikð Þ iff f1 6 k f . Therefore, the number ofpairs of correct-incorrect functions from Cl which are dis-

tinguished by Ik is the number of classes of equivalenceminus 1 (note that the elements in the class of equivalenceof f1 are not distinguishable), multiplied by the number ofelements in the class of equivalence. Thuswe have

jfðf 0; fÞ jf 2 E2; f 2 Clnff1g; di f; f0; Ikð Þgj ¼slsk����f½f j f 2 Cl; f1 6 k fg

��� 1�¼ sl

sk� ðsk � 1Þ:

Therefore,

d�rate Ik; Cl ; E2� � ¼ jfðf 0; fÞ jf 2 E2; f 2 C lnff1g; di f; f0; Ikð Þgj

jE2j � jClnE2j

¼ðsk � 1Þ � sl

sk

sl � 1¼

sl � slsk

sl � 1>

sl � slsk

sl¼ 1� 1

sk:

Thus, d�rate Ik; Cl ; E2� �

is higher than 1� 1sk. Note

that this lower bound does not depend on l. Thus, if wewant to find I and n such that d�rate I ; C1; E2ð Þ is atleast " for all l > n, then we just have pick any value of n

such that 1� 1sn

> " (that is, 1� " > 1sn) and pick I ¼ In.

Note that sn strictly grows with n, so such n exists. Weconclude ðC2; E2Þ 2 Class II.The next example shows that some reduction of the set

C2 of the previous example makes us lose the inclusion inClass II.

Example 5. Let us reconsider ðC2; E2Þ as defined in Example4. We modify C2 so that its pair with E2 no longerbelongs to Class II. Let C0

2 � C2 consist of f1 as well asall functions gk, with k 2 N, such that gk behaves as f1 for

all input sequences but the input sequence akb (and its

extensions akbs, for all s 2 fa; bg�). In particular, the out-

put produced by gk when input b is given after ak is theopposite as the one given by f1 (i.e. 0 if it is 1 in f1; 1 oth-

erwise). For all input sequence akbs, the outputs pro-

duced by gk for the rest of inputs after akb are the same as

the ones produced by f1. Therefore, for any non-decreas-

ing sequence: C1 � C2 � � � �Cn � Cnþ1 � � � � of subsetsof C0

2, the ratio d�rate I ; Cn; Enð Þ is arbitrarily low as n

increases. This is because, for all finite test suite I , thenumber of pairs of fðf1; fÞjf 2 C0

2nff1gg distinguishedby I is finite (in particular, this number is lower than orequal to jI j because, if some i 2 I distinguishes f1 fromsome wrong function f 0, then it cannot distinguish f1from any other wrong function). As a consequence, nosequence of subsets like that required by Definition 7 canbe constructed. Therefore, ðC0

2; E2Þ 62 Class II.

The two cases considered in the two previous examplesillustrate a fact that could look very surprising at a firstglance: By reducing the computation formalism from C2 toC0

2 � C2 (and thus reducing the number of possible wrongimplementations) the unboundedly-approachable finite tes-tability is lost. Intuitively, the reason is that C0

2 only includesfunctions with a single fault, while all FSMs are in C2

(including FSMs strongly deviating from the correct behav-ior). Thus, distinguishing correct implementations fromincorrect ones is easier in C2 than in C0

2. This shows that thedifficulty of testing does not lie in the number of possible

870 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 9, SEPTEMBER 2014

Page 10: A General Testability Theory:

wrong implementations to be discarded, but in the narrow-ness of the border between correct and incorrect potentialimplementations. Due to this narrowness, if the computa-tion formalism is C0

2 then testing is not very productive interms of distinguishability: After applying n tests, no morethan n pairs of correct/incorrect functions will be distin-guished—out of an infinite number of pairs of correct/incorrect functions to be distinguished.

Two additional elaborated examples concerningClass II are given in Section 5. In Case Study 5.1 we showhow the classical framework of M. Hennessy [17] for pro-cess algebras can be expressed into our formalism, and weshow its relation with Class II. In Case Study 5.2 we pres-ent a variant of FSMs where states are also endowed withtimeout transitions. Timeouts of states belong to a densedomain and can be arbitrarily close to 0, which implicitlyeliminates the possibility to discretize the times when wetest the IUT in order to reach completeness. Yet we provethat this case belongs to Class II.

Let us note that Class I serves us a reference ideal classof what we would like to achieve by testing: preciselydeciding whether the IUT is right or wrong. Althoughfinite complete testability is not a very likely scenario inpractice, knowing Class I (and the properties that allow tobe in it) lets us assess other more realistic testing caseswhich are not in Class I: In general, we will be willing tobe as close to Class I (i.e. as near to fulfill the propertiesrequired to be in the class) as possible. Later we will con-sider several formal ways to approach Class I such as e.g.assuming hypotheses regarding which functions can be inthe set C (Section 4.3). However, note that Class II intro-duces by itself a practical approximation to Class I. If weare in Class II, then any partial distinguishing rate < 1can be reached by some finite test suite. Note that thisnotion gives testers a formal measure of the productivity oftesting. The tester intuition says that the usefulness of test-ing decays as more tests are applied, that is, after applyingn test cases, the utility of applying one more test (i.e. thecapability of detecting a fault if none was detected so far)decreases with n. However, how much? The constructionrequired by Definition 7 for being in Class II implicitlylets us calculate that measure, because we can use it todefine an expression denoting, in each particular case, therelation between the distinguishing rate and the size of testsuites reaching that rate. For instance, a simple productivitymetric, denoting the marginal utility of applying one moretest after applying n test cases, could consist in the deriva-tive of the distinguishing rate with respect to the size of thetest suites reaching that rate. Though the distinguishingrate expressions developed before in Example 4, and laterin case studies 5.1 and 5.2, implicitly let us calculate theserelations, explicitly investigating them (as well as possibleseparations of Class II into subclasses in terms of thekinds of these relations) is out of the scope of this paperand will be investigated in our future work. This issue willbe revisited later in Section 7, in the conclusions.

4 STUDYING PROPERTIES OF CLASS I

In this section we study the properties of Class I. We showsome conditions required for finite testability, andwe present

some alternative characterizations. Transformations keepingthe testability are defined, and the effect of adding testinghypotheses is analyzed. The complexity of finding minimumcomplete test suites is studied, and we also measure how fara test suite is from being complete. Besides, we study how toreduce a testing problem into another, thus allowing us to findcomplete test suites for the former in terms of the latter. Someproperties can be applied to other testability classes as well,though a specific study of the remaining four classes is left asfuture work. For the sake of readability, in results presentedfrom now on we will assume that C denotes a computationformalism for a set of inputs I and a set of distinguished out-putsO, andE � C is a specification.

4.1 Characterizations of Class I

Next we present some simple conditions enabling/dis-abling the testability.

Lemma 1.We have the following properties:

a) Let f 2 E, f 0 2 CnE be such that for all i 2 I we haveo1 $ o2 for some o1 2 fðiÞ, o2 2 f 0ðiÞ. Then,ðC;EÞ 62 Class IV.

b) ðC;EÞ 62 Class I iff for all n 2 N andI ¼ fi1; . . . ; ing � I there exist f 2 E, f 0 2 CnEsuch that for all i 2 I we have o1 $ o2 for someo1 2 fðiÞ, o2 2 f 0ðiÞ.

c) Let us suppose that C is a finite computation formal-ism (that is, C is a finite set), and there do not existf 2 E, f 0 2 CnE such that for all i 2 I we haveo1 $ o2 for some o1 2 fðiÞ, o2 2 f 0ðiÞ. Then,ðC;EÞ 2 Class I.

d) Let us suppose that I is infinite and for some f 2 E wehave that, for all i 2 I, there exists f 0 2 CnE such thatfðiÞ 6$ f 0ðiÞ, but fði0Þ $ f 0ði0Þ for all i0 6¼ i with i0 2 I.Then, ðC;EÞ 62 Class I.

Proof. Properties (1) and (2) are straightforward. Next weprove (3). Let us consider the set A ¼ fðf; f 0Þjf 2E ^ f 0 2 CnEg. Let us note that A is finite because C isfinite. Since we assume that there do not exist f 2 E andf 0 2 CnE such that fðiÞ $ f 0ðiÞ for all i 2 I , for each pairðf; f 0Þ 2 A there exists some input i 2 I allowing to dis-tinguish f and f 0, that is, there exists i 2 I such thatfðiÞ 6$ f 0ðiÞ. Thus, a finite set containing an input allow-ing to distinguish f and f 0 for each pair ðf; f 0Þ is a finitecomplete test suite for C and E. Thus, ðC;EÞ 2 Class I.

Now we prove (4). Let us consider a finite test suiteI � I. Since I is finite, there is i 2 I such that i 62 I . Byhypothesis, there is f 0 2 CnE such that f 0ðiÞ 6$ fðiÞ butfði0Þ $ f 0ði0Þ for all i0 2 I with i 6¼ i0. Since i 62 I , I doesnot distinguish f and f 0. Thus, I cannot be complete.Since this happens for any finite test suite, we concludeðC;EÞ 62 Class I. tuThe previous results are, in fact, a simple rephrasing of

definitions given in Section 2. However, they help us to rea-son about the testability of problems from an abstract pointof view, in such a way that some (or most of) details con-cerning the structure of a computation formalism (states,instructions, etc) can be ignored. The next example illus-trates how the previous results can be applied.

RODR�IGUEZ ET AL.: A GENERAL TESTABILITY THEORY: CLASSES, PROPERTIES, COMPLEXITY, AND TESTING REDUCTIONS 871

Page 11: A General Testability Theory:

Example 6. Let ðC2; E2Þ be defined as in Example 3. Weargued that, in general, ðC2; E2Þ 62 Class I. Now weprove it by applying Lemma 1 (b). Let I ¼ fi1; . . . ; ing beany finite set of sequences of I 0 inputs, and let k be thelength of the longest sequence in I . Let E2 ¼ ffgwhere fdenotes the behavior of a deterministic completely-speci-fied FSM M. Since the number of states of FSMs repre-sented by C2 is not bounded, there exists f 0 2 C2

representing the behavior of an FSM M 0 such that for alli 2 I the response is the same as the one given by M, butthe answer is different for some sequence of length kþ 1.Hence, f 2 E2 and f 0 62 E2 are not distinguished by I . ByLemma 1 (b), ðC2; E2Þ 62 Class I.

Let ðC1; E1Þ be defined as in Example 3. We provide asimple proof of ðC1; E1Þ 2 Class I which does notrequire to reason about the set of all traces of somelength. Since we consider deterministic FSMs, for allf1; f2 2 C1 there exists i 2 I distinguishing f1 and f2.Thus, for all f 2 E1 and f 0 2 C1nE1 there exists somei 2 I distinguishing f and f 0. Since C1 is finite, by Lemma1 (c) we conclude ðC1; E1Þ 2 Class I. Let us applyLemma 1 (c) in a similar way, but in a very different con-text. Let P denote the set of all deterministic programswritten in some programming language with less than kcharacters, I denote the (finite) set of keys in a standardkeyboard, and O denote the (finite) set of ASCII symbolswith the trivial distinguishing relation, respectively. LetC0

1 represent the beginning of the behavior of all programsin P as follows: If f 2 C0

1 represents p 2 P then fðiÞ ¼ fogdenotes that o 2 O is the first character depicted in thescreen by p if a keystroke i 2 I is produced at the firstexecution tick. If no character is depicted after t ticksthen we consider fðiÞ ¼ fNullg with Null 2 O. LetE0

1 ¼ ffg for some f 2 C01. Since C0

1 is finite and all func-tions in C0

1 are distinguishable, again by Lemma 1 (c) wehave ðC0

1; E01Þ 2 Class I.

Lemma 1 (c) provides some sufficient conditions toachieve finite testability, though they are not necessary.We may also have finite testability when the computationformalism is infinite. The finitely testable pair ðC;EÞ con-sidered in the last paragraph of Example 3 in Section 3,dealing with an infinite set of programs of some kind,illustrated this fact. Next we consider an example in thecontext of FSMs. Let C6 represent all deterministic FSMsand E6 � C6 represent all deterministic FSMs withI 0 ¼ fa; bg such that, if they receive a in their initial state,then they produce q 2 O0, and any behavior is allowednext. This means that, for all sequence of I 0 inputs begin-ning by a, the sequence of O0 outputs must begin by q.We are imposing a requirement over all sequences begin-ning by a, that is a; aa; ab; aaa; aab; aba; abb; . . . 2 I. How-ever, we do not have to test all sequences in this infiniteset. In fact, fag is a complete test suite for ðC6; E6Þbecause, for all f 2 C6 (where f may or may not belongto E6), we have the following property: If fðaÞ ¼ fqg thenfor all wwe have fðawÞ ¼ fqw0g for some w0 (otherwise, fdoes not represent a deterministic FSM, which is requiredby C6). Thus, the input a distinguishes any pair of func-tions f 2 E6 and f 0 2 C6nE6. Hence, fag is a (finite) com-plete test suite for the infinite sets C6 and E6, and wehave ðC6; E6Þ 2 Class I.

Let us consider again programs written in some pro-gramming language. Let P denote the set of all terminat-ing deterministic programs such that, after launchingthem, they wait for receiving a character sequence fromthe user and next they display a sequence of characters.Let us note that this time we are not constraining thelength of programs, so the source code of these programsmay have any length. Let I ¼ O denote the set of allsequences of ASCII characters. Let C00

1 represent all pro-grams in P as follows: If f 2 C00

1 represents p 2 P thenfðiÞ ¼ fog denotes that the sequence of ASCII symbolso 2 O is depicted by p if, after launching p, the user writesthe sequence i 2 I and next presses enter. If no sequenceis depicted before the termination of p then we considerfðiÞ ¼ fNullg with Null 2 O. Let E00

1 ¼ ffg consist of asingle function f 2 C00

1 . Given any deterministic programp and a sequence of characters i, we can trivially con-struct another deterministic program p0 such that p0

behaves as p for all sequences of characters but i (we justcatch this special case with an if-then-else statement at thebeginning). Hence, for all i 2 I we can find a function f 0

2 C001nE00

1 such that f and f 0 are only distinguished by i.Since the set of all sequences of ASCII characters is infi-nite, by Lemma 1 (d) we conclude ðC00

1 ; E001 Þ 62 Class I.

Let us revisit ðC6; E6Þ given before in this example andlet f 2 E6 be an arbitrary function. We cannot find, for alli 2 I, a function f 0 2 C6nE6 such that f and f 0 are onlydistinguished by i, because f and f 0 must answer differ-ent outputs for infinite inputs (in particular, for allsequences beginning by a). Thus, in this case we cannotapply Lemma 1 (d) to infer ðC6; E6Þ 62 Class I.Before presenting the next results, let us introduce some

notation to facilitate the reading. Next we identify the set ofall inputs that distinguish two functions.

Definition 8. Let f 2 E and f 0 2 CnE. We define the set ofinputs that distinguish f and f 0, written disobsðf; f 0Þ, as

disobsðf; f 0Þ ¼ fi j i 2 I ^ fðiÞ 6$ f 0ðiÞg:

The next result analyzes conditions preserving testabil-ity, that is, we study how we can modify C or E in such away that complete test suites are preserved. Let us sup-pose that distinguishing f1 2 E and f 01 2 CnE requiresapplying an input that, in particular, also distinguishesf2 2 E and f 0

2 2 CnE. Then test suites do not need to careabout f2 and f 0

2 in order to achieve completeness, so wecan ignore them. In this way, smaller sets of functionscan be considered.

Lemma 2. (Equi-testability Lemma) Let C be a computationformalism for I and O, and let E � C. Let C0 � C be suchthat for all f 2 E and f 0 2 CnE there exist g 2 EnC0 andg0 2 ðCnEÞnC0 verifying disobsðg; g0Þ 6¼ ? and disobsðg;g0Þ � disobsðf; f 0Þ. Then, I is a complete test suite for C andE iff I is a complete test suite for CnC0 and EnC0.

Proof.

¼): Let I be a complete test suite for C and E. SinceEnC0 � E and ðCnEÞnC0 � CnE, I is a complete testsuite for CnC0 and EnC0.(¼: Let I be a complete test suite for CnC0 and EnC0. Letus prove that I is a complete set for C and E. Let us

872 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 9, SEPTEMBER 2014

Page 12: A General Testability Theory:

consider f 2 E and f 0 2 CnE. By hypothesis, there existsg 2 EnC0 and g0 2 ðCnEÞnC0 such that disobsðg; g0Þ 6¼ ?

and disobsðg; g0Þ � disobsðf; f 0Þ. Since I is complete forCnC0 and EnC0, there is i 2 I such that gðiÞ 6$ g0ðiÞ. Bythe definition of disobsðg; g0Þ, we have i 2 disobsðg; g0Þ.Since disobsðg; g0Þ � disobsðf; f 0Þ, again by the definitionof disobsðf; f 0Þ we conclude fðiÞ 6$ f 0ðiÞ. Thus, I is a com-plete test suite for ðC;EÞ. tu

Example 7. We apply Lemma 2 to find out that we canignore some functions in the search for a complete testsuite. Let C consist of some non-deterministic functionsfor input set I and output set O, and let E ¼ ffg forsome f 2 C. For some i 2 I, let V be the set of all func-tions f 0 2 CnE such that fðiÞ and f 0ðiÞ are disjoint, andlet g 2 V be some function such that, for all i0 2 Infig, wehave gði0Þ ¼ fði0Þ. Note that for all f 0 2 V we havedisobsðf; gÞ ¼ fig � disobsðf; f 0Þ. Thus, by Lemma 2 weconclude that a set I � I is a complete test suite forðC;EÞ iff I is a complete test suite for ððCnV Þ [ fgg; EÞ.Hence, all functions in V but g can be ignored, and justthe smaller set ðCnV Þ [ fgg � C has to be considered toseek for the completeness for C. Actually, ððCnV Þ[fgg; EÞ 2 Class Iwould imply ðC;EÞ 2 Class I.

A very different approach to reduce a testing probleminto another will be presented later in Definition 12.

Next we present a result providing a lower bound of thenumber of inputs that must be included in a set to achieve acomplete test suite.

Proposition 1. Let A � 2I be a pairwise disjoint6 set of sets ofinputs such that for all A 2 A there exist f 2E, f 0 2CnE ver-ifying disobsðf; f 0Þ�A. Then jI j � jAj for any finite completetest suite I for C and E.

Proof. Let us assume that there exists a complete test suiteI � I for ðC;EÞ. In order to prove the result, we define aninjective function h : A7!I . For anyA 2 A, let us take f 2 Eand f 0 2 CnE fulfilling the premises of the proposition.Since I is complete, theremust be some iA 2 I satisfying

iA 2 I \ fi j i 2 I ^ fðiÞ 6$ f 0ðiÞg � I \A

Let us define hðAÞ ¼ iA. This function is injective since Ais pairwise disjoint. Thus, jIj jAj. tuLet us note that if A is not finite then, by applying the

previous proposition, we immediately infer ðC;EÞ 62Class I. The next example uses this property.

Example 8. Let ðC2; E2Þ be defined as in Example 3. In partic-ular, E2 ¼ ffg consists of a single correct function, I ¼ I 0�

denotes the set of all sequences of I 0 inputs, and a; b 2 I 0

denote any two inputs. SinceC2 does not limit the numberof states of represented FSMs, for all sequence i 2 I 0� wecan find two FSMs represented by C2 whose answers dif-fer only for i (as well as its extensions is for all s 2 I 0�). LetQk ¼ fakbsjs 2 I 0�g. For all k 2 N we can find a functionf 0 2 CnE such that only those input sequences that areincluded inQk allow to distinguish f and f 0 (in particular,

this is so if f 0 behaves as f for all input sequences but akb,where a wrong output is produced). For all k1; k2 2 N

with k1 6¼ k2 we have Qk1 \Qk2 ¼ ? . Thus, the infinite set

A ¼ fQkjk 2 Ng fulfills the conditions of Proposition 1.Hence, we rediscover ðC2; E2Þ 62Class I.

Interestingly, we may have ðC;EÞ 62 Class I even ifthere does not exist any infinite set A fulfilling the condi-tions of Proposition 1 (that is, the longest set fulfilling theconditions is finite). Let us show it by explicitly construct-ing such a case. Let C be a computation formalism andE � C be a specification such that C and CnE are infinitesets. Each input will be decorated with the name of twocorrect and two incorrect functions. Specifically, I ¼ix

0;y0x;y x; y 2 E ^ x0; y0 2 CnEj

n o. Besides, we considerO ¼

fisxjx 2 Cg [ fnotdistx;yjx; y 2 Cg. As we will see, an out-put of the form isf will identify the function f emitting it,because it will be emitted only by f . Besides, an output ofthe form notdistx;y will make functions x and y not to bedistinguished by the input leading to emit that output,because both x and ywill emit it. We will assume that theordering of indexes in inputs and outputs is irrelevant, so

e.g. we assume if0;g0

f;g ¼ ig0;f 0f;g and notdisth;h0 ¼ notdisth0;h.

Note that I andO are infinite sets.For all function q 2 C we define q if

0;g0f;g

� �¼ fisqg[

fnotdistq;hjh 2 Cnff; f 0; g; g0gg if q 2 ff; f 0; g; g0g; else

q if0;g0

f;g

� �¼ O. According to this construction, input if

0;g0f;g

distinguishes h and h0 if h 2 ff; gg and h0 2 ff 0; g0gbecause we have h if

0;g0f;g

� �\ h0 if

0;g0f;g

� �¼ ? . Otherwise, i.e.

if h 62 ff; gg or h0 62 ff 0; g0g, we have h if0;g0

f;g

� �\ h0 if

0;g0f;g

� �6¼

? , so if0;g0

f;g does not distinguish h and h0. LetW be the con-

dition, required by Proposition 1, that for all A 2 A thereexist f2E, f 0 2CnE verifying fi j i 2I^ fðiÞ 6$ f 0ðiÞg�A.Any set of inputs A fulfilling condition W must include,for some pair of functions f; f 0, all inputs allowing to dis-tinguish f; f 0. For any other set A0 including all inputsallowing to distinguish another pair g; g0, the set A sharesat least one input with A0 (in particular, A and A0 share

the input if0;g0

f;g ). Thus, there do not exist two disjoint sets

A and A0 fulfilling condition W . Since Proposition 1 alsorequires that all sets of A are pairwise disjoint, the condi-tions of the proposition are fulfilled only if A ¼ fIg, i.e.A cannot be infinite. However, there does not exist anyfinite complete test suite for C and E either: Each input

if0;g0

f;g distinguishes only four pairs (in particular, any com-

bination of f or f 0 with g or g0), but there are infinite pairsto be distinguished (recall that we are assuming that Eand CnE are infinite sets). Thus, ðC;EÞ 62 Class I.

Next we consider an alternative way to find complete testsuites by manipulating the sets distinguishing each pair offunctions.

Proposition 2. LetG be the set of all sets of inputs allowing to dis-tinguish each correct function from each incorrect function, that is:

G ¼ disobsðf; f0Þ j f 2 E ^ f0 2 CnE

We have ðC;EÞ2Class I iff there exist n 2 N subsets of G,denoted by A1; . . . ; An � G, such that G ¼ S

1jnAi andTB2Aj

B 6¼ ? for all 1 j n.6. That is, for all A;A0 2 Awe have A ¼ A0 or A \ A0 ¼ ? .

RODR�IGUEZ ET AL.: A GENERAL TESTABILITY THEORY: CLASSES, PROPERTIES, COMPLEXITY, AND TESTING REDUCTIONS 873

Page 13: A General Testability Theory:

Moreover, if such n subsets of G exist, then there is a com-plete test suite I such that jI j n.

Proof.

¼): If ðC;EÞ 2 Class I then there exists a finite completetest suite fi1; . . . ; ing for C and E. For each ij, let Aj be theset of all sets of inputs distinguishing a correct and anincorrect function such that ij is included, that is,Aj ¼ fB j B 2 G ^ ij 2 Bg � G. We have ij 2

TB2Aj

B,

soT

B2AjB 6¼ ? . For all B 2 G, we have B \ fi1; . . . ; ing 6¼

? (otherwise, some pair of functions would not be distin-guished by fi1; . . . ; ing and it would not be a complete testsuite). So, for some 1 j nwe have ij 2 B \ fi1; . . . ; ing.Thus, by the construction of sets Aj, for all B 2 G wehave B 2 Aj for some 1 j n, and we concludeS

1jnAj ¼ G.

(¼: This implication is easy to prove: We can build afinite complete test suite by taking, for all 1 j n, anyinput from the set

TB2Aj

B 6¼ ? . Let us note that the size

of the resulting test suite is n.tu

In the following example we show an application of theprevious property.

Example 9. Let us consider I ¼ N, O ¼ f0; 1g, and for all

i 2 Nþ the functions fiðxÞ : N 7! 2f0;1g defined as:

fiðxÞ ¼ f1g if x mod i ¼ 0;f0g if x modi 6¼ 0:

Let C ¼ ffi j i 2 Nþg and E ¼ ff6g. Let us considerBi ¼ disobsðfi; f6Þ for i > 0, i 6¼ 6. We check that the sets

A1 ¼ fB1g, A2 ¼ fB2g, A3 ¼ fB3g and A4 ¼Bi j i > 0

^ i 62 f1; 2; 3; 6g satisfy the hypothesis of Proposition 2.

First we check thatG ¼ A1 [A2 [A3 [A4. This is obviousbecause G ¼ fBi j i > 0 ^ i 6¼ 6g. Now let us check thatT

B2AiB 6¼ ? for i 2 f1; 2; 3; 4g. Since jAij ¼ 1 for i 3,

this condition is trivial for these sets. Finally, we have6 2 Bi for i 62 f1; 2; 3; 6g, soTB2A4

B 6¼ ? .

Thus, by Proposition 2 we conclude ðC;EÞ 2 Class I.Moreover, we can easily extract a complete test suitefrom sets A1, A2, A3, A4: In order to distinguish all pairsof functions ðfi; f6Þ, for each set Ai we just have to pickany element from the intersection of all sets in Ai. In fact,f1; 2; 3; 6g is a complete test suite for C and E.

4.2 Minimum Test Suites

Next we consider the problem of finding the minimum com-plete test suite in a particularly basic case:

(a) the computation formalism C ¼ ff1; . . . ; fng is finite,(b) the sets I ¼ fi1; . . . ; ikg and O ¼ fo1; . . . ; olg of inputs

and outputs are finite as well.Before continuing, let us note that any function f : I 7! 2O

can be considered as a set of pairs f � I � 2O such that for all

i 2 I there is a single outs 2 2O with ði; outsÞ 2 f . Therefore,any finite function can be extensionally defined as the set oftuples f ¼ fði1; outs1Þ; . . . ðik; outskÞg. Since we have alreadyenumerated the set of inputs I ¼ fi1; . . . ; ikg, the function f

can be implicitly defined by the tuple ðouts1; . . . ; outskÞ.Thus, in the rest of the paper we will identify any finite func-tion f 2 C with such a tuple: f ¼ ðouts1; . . . ; outskÞ.

We will call the problem of finding a complete test suiteof some size or smaller under the previous conditions theComplete Suite problem, and we will denote it by CS. So,the corresponding optimization problem consists in findingthe minimum complete test suite.

Definition 9. Let C be a finite computation formalism for thefinite set of inputs and outputs I, a set of outputs O with a dis-tinguishing relation 6$ , and E �C be a specification.

Given C, E, I, O, 6$ , and someK 2 N, the Complete Suiteproblem (CS) is defined as follows: Is there any complete testsuite I for C and E such that jI j K?

Theorem 1. CS 2 NP�complete.

The proof of this theorem is given in Section 6. ProvingCS 2 NP is straightforward, whereas CS 2 NP�hard is provedby reducing the Set Cover problem to CS.

This result shows us that, in general, finding the mini-mum complete test suite (if it exists) is a hard problem, as theNP-completeness of the decision problem implies the NP-hardness of the corresponding optimization problem. Weimmediately infer that other useful generalizations of thisoptimization problem are alsoNP-hard. For instance, findingthe minimum test suite reaching some given distinguishingrate d 1 (seeDefinition 6); finding the test suitewhichmaxi-mizes some linear combination of the distinguishing rate and(the inverse of) the number of test cases in the suite; or doingthat under the assumption test cases might have differentcosts and these costs are explicitly given, generalize that prob-lem, so they are trivially NP-hard problems as well.

Several testing techniques face the problem of measuringthe (a priori) capability of test suites to find faults (e.g.,Mutation Testing [11], [21]), and thus they also supportmethodologies to select good test suites: We pick those testsuites reaching the highest score according to some fault-detection capability measure. Theorem 1 shows us thatthese tasks will be hard in general. Note that, although CS isdefined in terms of our completeness criterion (i.e., distin-guishing all pairs of correct/incorrect functions), CS wouldremain NP-complete even if we assumed the more particu-lar case where there is a single acceptable behavior and allother possible behaviors are incorrect.7 In this case, wewould not speak about distinguishing correct/incorrectpairs of functions, but just about discarding faulty implemen-tations, which actually fits into other test-suite fault-detection metrics used in the literature.8 However, the prob-lem would still be NP-complete in this particular case. Of

7. In the proof of NP-completeness of CS given in Section 6, the poly-nomial reduction used to prove its NP-hardness considers a singleincorrect function. Since the problem treats correct and incorrect func-tions symmetrically, that reduction also works if correct and incorrectfunction roles are trivially exchanged so that a single correct function isconsidered.

8. For instance, in the simplest form of Mutation Testing, a singleprogram behavior is considered correct, which is the behavior of thespecification program (obviously, it is also the behavior of any otherequivalent program). All other behaviors are considered incorrect.Then, a measure of the quality of a test suite is the portion of incorrectfunctions (behaviors) detected by the suite (i.e. the number of incor-rect mutants it kills).

874 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 9, SEPTEMBER 2014

Page 14: A General Testability Theory:

course, the problem may be polynomial if only some partic-ular kind of systems is considered.

4.3 Using Testing Hypotheses to Enable FiniteTestability

In this section we present some notions allowing us to rea-son about testing hypotheses, which play a key role in formaltesting methodologies. Assuming a testing hypothesis typi-cally consists in making an assumption about the set of sys-tems that could actually be the IUT. As we have seen inseveral examples, the testability is affected by assumingsome restrictions about the IUT (e.g., testing deterministicFSMs is not in Class I, but it is if a given limit of the num-ber of states is assumed). Thus, assuming a testing hypothe-sis might allow an incomplete test suite I to becomecomplete (provided that the hypothesis actually holds).Next we study the conditions for this. We will denote ahypothesis by the set H of systems we remove from thosethat could actually be the IUT if the hypothesis is assumed.

Definition 10. A testing hypothesis H for C is a subsetH � C.

Let I � I not be a complete test suite for C and E. If I is acomplete test suite for CnH and EnH, then we say that Henables I for ðC;EÞ.

Let ðC;EÞ 62 Class I. If ðCnH;EnHÞ2Class I then wesay thatH enablesðC;EÞ.

Lemma 3. Let I � I be a finite set of inputs that is not a completetest suite for C and E and let H be a testing hypothesis for C.H enables I for ðC;EÞ iff, for all f 2 E and f 0 2 CnE suchthat fðiÞ $ f 0ðiÞ for all i 2 I , either f 2 H or f 0 2 H holds.

Proof.

¼): Let us consider f 2 C and f 0 2 CnE such that for alli 2 I we have fðiÞ $ f 0ðiÞ. Since I is a complete test suitefor ðCnH;EnHÞ, we infer f 62 CnH or f 0 62 EnH. So f 2 Hor f 0 2 H.(¼: In order to check that I is complete for ðCnH;EnHÞ,let us consider f 2 EnH and f 0 2 ðCnHÞnðEnHÞ ¼ðCnEÞnH. Since f 62 H and f 0 62 H, there is i 2 I suchthat fðiÞ 6$ f 0ðiÞ. Then I is complete for ðCnH;EnHÞ.

tuIn Section 3.2 we proposed to use the distinguishing rate

(see Definition 7) to measure the coverage of an incompletefinite test suite. Alternatively, next we consider measuringthe coverage of an incomplete test suite I in terms of theamount of potential functions that should be removed from Ctomake I complete, that is, in terms of the number of potentialimplementations we have to assume not to be the actual IUTto make I complete. By removing functions from C, thenumber of pairs of complete/incomplete functions to be dis-tinguished is reduced. This may turn an incomplete test suiteinto complete or, moreover, enable finite testability in a prob-lem that did not belong to Class I before removingfunctions.

According to this idea, we consider that an incompletetest suite is better than another incomplete suite if makingthe former suite complete requires removing less functions.That is, the suite requiring a kind of weaker function removalassumption to be complete is better. As we have said before,

testing hypotheses are assumed to make this effect indeed,that is, to reduce the number of potential implementationsso that finite test suites can be complete—provided that thehypotheses hold. Thus, if a test suite requires less or weakerhypotheses to be assumed (i.e., less potential implementa-tions have to be removed) to get completeness than another,then the former test suite is better because it is closer to becomplete indeed. The next example illustrates this idea.

Example 10. Let us suppose that there are only two correctfunctions f1 and f2 in E, and only two incorrect functionsf 01 and f 0

2 in CnE. Let us suppose that test suite I 1 doesnot distinguish f2 from f 01 or f 0

2, but it distinguishes allremaining pairs of correct and incorrect functions.Besides, I2 does not distinguish f1 from f 01 nor f2 fromf 02, but it distinguishes all other correct-incorrect pairs.Note that the number of pairs not distinguished by eachtest suite is 2 in both cases, so the distinguishing rate (seeDefinition 6) would not allow us to prefer I 1 over I 2 orthe other way around. Besides, let us assume that

� The system represented by f1 is deterministic; italways handles some user’s request Q in the sameway (that is, the way it handles Q is the same inall system configurations); and it implements cor-rectly some functionality Z.

� f2 is non-deterministic; it does not always handleQ in the same way (i.e., the way it handles Q isdifferent at different system configurations); andit implements correctly functionality Z.

� f 01 is deterministic; it always handles Q in thesame way; and it implements incorrectly func-tionality Z.

� f 02 is deterministic; it does not always handle Q inthe same way; and it implements correctly func-tionality Z.

Let us suppose that the tester can assume (or not) thefollowing testing hypotheses: (i) the IUT is deterministic;(ii) requestQ is always handled in the sameway; (iii) func-tionality Z is correctly implemented. Besides, the testerassumes that the feasibility of each of these hypotheses isvery similar. Which test suite (I1 or I2) requires the small-est set of hypotheses to be complete? We can see that I 1

would become complete if hypothesis (i) were assumed: Iff2 can be discarded, then both pairs not distinguished byI 1 (i.e., ðf2; f 0

1Þ and ðf2; f 02Þ) no longer exist, so I 1 becomes

a complete test suite. On the other hand, I2 would becomecomplete if hypotheses (i) and (iii) were assumed, as theywould let us discard f2 and f 0

1, respectively (so all pairsnot distinguished by I 2 would not longer exist and I 2

would be complete). The test suite I 2 would also becomecomplete if hypotheses (ii) and (iii) were assumed, butthere is no hypothesis which achieves that alone. Thus, I 1

requires a smaller set of hypotheses than I2 to becomecomplete. If we can assume that all three hypotheses havethe same feasibility and all are independent from eachother, thenwe can consider that I1 is a better test suite thanI 2, because it is closer to be complete.

Next we study the difficulty to assess the coverage mea-sure of a test suite in these terms, and we do it in the samecontext as when we defined the CS problem in Definition 9,

RODR�IGUEZ ET AL.: A GENERAL TESTABILITY THEORY: CLASSES, PROPERTIES, COMPLEXITY, AND TESTING REDUCTIONS 875

Page 15: A General Testability Theory:

that is, C;E; I; O are finite. In the next definition, the setH ¼ fH1; . . . ; Hng represents the hypotheses the tester mayor may not assume: If the hypothesis Hi � C is assumedthen all functions in Hi are assumed not to be in the actualcomputation formalism (e.g. if we are assuming that theIUT is deterministic then we assume some H � C, where Hconsists of all non-deterministic functions in C).

Definition 11. Let C be a finite computation formalism for thefinite set of inputs I, the finite set of outputs O with the distin-guishing relation 6$ , and E � C be a finite specification.

In addition, let I � I be a set of inputs and H ¼fH1; . . . ; Hng, where for all 1 i n we haveHi � C.

Given C, E, I, O , 6$ , I and some K 2 N, the FunctionRemoval problem (FR) is defined as follows: Is there any setR � C with jRj K such that I is a complete test suite forCnR and EnR?

Given C, E, I, O, 6$ , I , H, and some K 2 N, the Functionremoval via Hypotheses problem (FH) is defined as follows:Is there any set of hypotheses R � H with j S H2RHj K

such that I is a complete test suite for Cnð S H2RHÞ andEnð S H2RHÞ?

Given C, E, I, O, 6$ , I ,H, and someK 2 N, theHypothe-ses Assumption problem (HA) is defined as follows: Is thereany set R � H with jRj K such that I is a complete testsuite for Cnð S H2RHÞ and Enð S H2RHÞ?The differences among the previous problems (and the

corresponding coverage measures they allow to calculate) isthe following: In FR, the coverage is measured in terms of thenumber of functions that must be removed, where any subsetofC is allowed to be removed. In FH, the number of functionsis considered again, though this time the set of functions tobe removed must be the union of some hypotheses (sets offunctions) from a given hypotheses repertory H. Finally, HAconsiders the coverage in terms of the number of assumedhypotheses, rather than the number of removed functions. Ata first glance, one might think that all FR, FH, and HA are NP-complete problems, and so their corresponding optimizationproblems are all NP-hard. Interestingly, FR is not NP-com-plete, as it can be reduced to theminimumVertex Cover prob-lem in bipartite graphs, which in turn is equivalent to themaximumMatching problem. This problem can be solved inpolynomial time by the Hopcroft-Karp algorithm [20]. Thus,measuring the coverage of an incomplete test suite in termsof the minimum number of functions that must be removedto achieve completeness is a tractable problem indeed. Theproof of the next result, given in Section 6, introduces theproposed reduction, and uses it to actually find theminimum

function removal in time OðjCj5=2 þ jCj2 � jI j � jOj2Þ, thussolving FR and its corresponding optimization versionpolynomially. On the other hand, the NP-completeness of FHand HA is proved by constructing polynomial reductionsfrom 3-SAT and Set Cover, respectively, which implies thattheir optimization versions are NP-hard problems.

Theorem 2.We have the following properties:

(a) FR 2 P

(b) FH 2 NP�complete

(c) HA 2 NP�complete

The proof of this result is given in Section 6.

The distinguishing rate notion given in Definition 6, andthe previous three measures based on assessing how many(groups of) functions we should assume not to be the IUT toreach completeness, attempt to answer the question of howfar an incomplete test suite is from being complete. Sincethey do so in an abstract manner (i.e., they do not rely onnumbers of states, transitions, code lines, etc), they are gen-eral enough to be applied to any testing setting.

In particular, let us consider a setting where testing isparticularly costly because e.g., each test case will take along time, each test wastes some resources, etc. Let us sup-pose that a finite set of test cases is designed according tothe testers’ experience. Similarly, a finite fault model is con-structed to denote some representative right and wrong def-initions of the IUT (or just a set of its possible faults, whichmay be combined in the IUT in any possible way). Thus, theparticular setting required in the definitions of problems CS,FR, FH, or HA (Definitions 9 and 11), which require that thesets of possible test cases and possible IUT definitions arefinite and explicitly given, is met. Given a manuallydesigned set of test cases, such that the cost of applying allconsidered test cases in the set is not feasible (due to time,money, etc), the problem of selecting a good subset from theset necessarily arises. Hence, the proposed notions of distin-guishing rate and least required hypothesis providing complete-ness provide a general criterion to pick some of these testcases in terms of whether the set of selected test cases willhave (a priori) a good capability to detect faults.

Similarly, as we said in the case of CS, we could alsoconsider some useful variants of problems FR, FH, or HA. Forinstance, it is trivial to see that FH and HA would also remainNP-complete if we assigned weights to each functionor hypothesis, respectively.9 Note that weights would allowthe tester to express that assuming some hypothesis A doesnot have the same likelihood than assuming anotherhypothesis B.

4.4 Testing Reductions

In this section we present a notion to relate different testingscenarios. Given a computation formalism, we consider theproblem of finding a complete test suite for any specificationbelonging to a given set of specifications. This is the typicalgoal of testing methodologies: For any specification fittinginto a given kind of specifications (for us, a computation for-malism), find a way to derive tests from the specification insuch a way that the test suite is complete. Moreover, ratherthan imposing a fixed computation formalism for all possi-ble specifications, a (possibly infinite) set of pairs ðC;EÞ willdenote the cases to be considered. This will improve thegenerality of the framework. For instance, if the testerassumes that the IUT includes at most n faults with respectto the specification, then a different set C should be consid-ered for each possible set E. So, for us a testing problem willbe a set of pairs ðC;EÞ, and the goal of a testing problem willbe, given any pair in the set, finding a complete test suite forthat pair. Ideally, our knowledge about how to find suitabletests for a given testing problem (i.e., for a given kind of

9. The question of whether FR would still be polynomial if eachfunction had a different weight is still open, and will be addressed infuture work.

876 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 9, SEPTEMBER 2014

Page 16: A General Testability Theory:

target formalisms and specifications) could help us to faceother testing problems. In order to enable this, we introducea general notion of testability reduction. If a testing problemcan be solved by transforming it into another testing problemand solving the latter, then we will say that the former prob-lem can be reduced to the latter. This provides a criterion toclassify problems inside each class.

Definition 12. A testing problem T for a set of inputs I is a setT of pairs ðC;EÞ with the usual meanings of C and E where,for all ðC;EÞ 2 T , the set of inputs of C (that is, the domain offunctions in C) is included in I.

Let T � Class I. A computable function d : T�!2I is afinite suite derivation for T if, for all p 2 T , dðpÞ is a finitecomplete test suite for p.

Let T 1 and T 2 be testing problems for I1 and I2. T 1 can befinitely reduced to T 2, denoted by T 1 F T 2, if there exist

some computable functions e : T 1�!T 2 and t : 2I2�!2I1 such

that, for all p1 2 T 1 and I � I2, if I is a finite complete testsuite for eðp1Þ then tðIÞ is a finite complete test suite for p1.tu

Theorem 3. (Testing reduction Theorem) We have the follow-ing properties:

(a) F is a preorder.(b) Let T 1 F T 2. If there exists a finite suite derivation

for T 2 then there exists a finite suite derivation for T 1.(c) Let T 1F T 2. If T 2�Class I then T 1�Class I.

Proof. We will consider that T 1, T 2, and T 3 are testing prob-lems for the sets of inputs I1, I2, and I3, respectively.

Let us prove (a). First, we consider the reflexivity of

F . Let e : T 1 �! T 1 and t : 2I1 �! 2I1 be identity func-tions. We trivially have that for all ðC;EÞ 2 T 1 andI � I1, if I is a finite complete test suite foreðC;EÞ ¼ ðC;EÞ then tðIÞ ¼ I is a finite test suite forðC;EÞ. Thus, T 1 F T 1.

We prove the transitivity of F . Let us assumeT 1 F T 2 and T 2 F T 3. There exist functions

e : T 1 �! T 2 and t : 2I2 �! 2I1 such that if I is a finitecomplete test suite for eðC1; E1Þ then tðIÞ is a finite com-plete test suite for ðC1; E1Þ, for all ðC1; E1Þ 2 T 1 andI � I2. Besides, there also exist functions e0 : T 2 �! T 3

and t0 : 2I3 �! 2I2 such that if I is a finite complete testsuite for e0ðC2; E2Þ then t0ðIÞ is a finite complete test suitefor ðC2; E2Þ, for all ðC2; E2Þ 2 T 2 and I � I3. Lete00 ¼ e0 � e and t00 ¼ t � t0. Let ðC1; E1Þ 2 T 1 and I be afinite complete test suite for e00ðC1; E1Þ. This implies thatt0ðIÞ is a complete test suite for eðC1; E1Þ. In turn, thisimplies that tðt0ðIÞÞ is a complete test suite for ðC1; E1Þ.Thus, we can use functions e00 and t00 to make therequired transformations between T 1 and T 2, and wehave T 1 F T 3.

Next we consider (b). Since T 1 F T 2, there exist com-

putable functions e : T 1 �! T 2 and t : 2I2 �! 2I1 suchthat if I is a finite complete test suite for eðC1; E1Þ thentðIÞ is a finite complete test suite for ðC1; E1Þ, for all

ðC1; E1Þ 2 T 1 and I � I2. Besides, let d : T 2 �! 2I be afinite suite derivation for T 2. We know that d is comput-able, and dðC2; E2Þ is a finite complete test suite forðC2; E2Þ, for all ðC2; E2Þ 2 T 2. Therefore, given

ðC1; E1Þ 2 T 1, tðdðeðC1; E1ÞÞÞ is a finite complete test suitefor ðC1; E1Þ. The composition of computable functions iscomputable, so h ¼ t � d � e is computable. Thus, h is afinite suite derivation for T 1.

The property (c) is proved by using very similar argu-ments as in (b), though now we do not construct a finitesuite derivation for T 1. Instead, functions e and t areused to prove the existence of a finite complete test suitefor each ðC1; E1Þ 2 T 1 as follows. Since T 2 � Class I,there exists a finite complete test suite for eðC1; E1Þ 2 T 2.By applying function t, this suite can be transformed intoa finite complete test suite for ðC1; E1Þ 2 T 1. Thus,T 1 � Class I. tuThe relation F lets us relate testing problems with each

other. In this sense, it reminds the reductions of computabil-ity and complexity theory, where a computation problem istransformed into another problem.10 In our case, we checkwhether finding a finite complete test suite in a given sce-nario can be achieved by means of finding a finite completetest suite in another one.

We illustrate F in Case Study 5.3 of the next section,where the problems of testing FSMs, EFSMs and TEFSMsare related by means of testing reductions. Due to somefiniteness constraints imposed to EFSMs and TEFSMs, itturns out that each computation formalism can simulate theother two ones. However note that, in general, a testingreduction does not consist in mapping one computation for-malism into another, but in mapping the border between cor-rectness and incorrectness from one problem to another. Thisis illustrated later in the same case study with an additionalshort example, where testing Turing Machines is reduced totesting Finite Automata when faults follow a given form andno size constraint is assumed (so the latter cannot simulatethe former). Section 5 also presents the more elaborated CaseStudy 5.4, where testing several kinds of machines involvingan increasing magnitude (some denoted by infinite computa-tion formalisms) are reduced into the problem of testingdeterministic FSMs with no more that n states (which is afinite computation formalism). Moreover, in some cases tran-sitions are labeled with magnitude values from a continuousdomain, and yet they can be reduced to that problem. Also,a series of reductions proves the inclusion of the probleminvolving n magnitudes into Class I, and this is used todevelop a method to test a car control device.

5 CASE STUDIES

In this section we present four elaborated examples wherethe notions proposed in the paper are applied to differentscenarios. They include the discovery of some interestingproperties for some well-known testing frameworks as wellas other useful ones.

5.1 Case Study: Hennessy’s Framework

Let us study the testability of a classical testing frameworkin terms of the testability notions proposed here. In particu-lar, we will study the relation of the semantic framework

10. In fact, a testing problem also represents a kind of computabilityproblem.

RODR�IGUEZ ET AL.: A GENERAL TESTABILITY THEORY: CLASSES, PROPERTIES, COMPLEXITY, AND TESTING REDUCTIONS 877

Page 17: A General Testability Theory:

proposed by M. Hennessy in [17] with Class II. In thatframework, a theory of concurrent processes is presented.Systems are defined as processes, and each process implic-itly defines an associated labeled transition system denotinghow actions are iteratively performed. There is no distinc-tion between inputs and outputs, we just consider actions. Inorder to check if a process passes a test, all possible computa-tions of the process synchronized in parallel with the test(that is, all possible sequences of transitions which may betaken by the process in response to the test) are studied.There are two notions of passing a test: must and may testpassing. A process passes a test in themust sense if all possi-ble computations are considered correct by the test. A pro-cess is passed in the may sense if there exists at least onecomputation considered correct by the test. In both cases,the outcome of a test is either acceptance or failure.

Before presenting in more detail other relevant aspects ofthat framework, it is worth to mention that the effect of thenon-determinism on whether we can consider a complete testsuite as complete is different in that framework and ours.We illustrate it with an example defined in our setting. Letf1ðiÞ ¼ fag and f2ðiÞ ¼ fa; bg. That is, if input i is given,then f1 will always reply a, whereas f2 non-deterministi-cally replies either a or b. In our framework we considerthat fig is not a complete test suite for ðff1; f2g; ff1gÞbecause, after applying i to either f1 or f2, its reply does notnecessarily let us decide whether it was f1 or f2 (the reply adoes not allow us to determine which one the IUT is). Onthe contrary, according to other views of non-determinismin testing such as Hennessy’s, fig distinguishes f1 and f2indeed. In terms of our setting, this can be justified as fol-lows. Let us assume that, if we apply input i many times tof2, then we will eventually observe a and we will eventuallyobserve b, that is, we assume a fairness hypothesis. If weassume that all non-deterministic reactions of the IUT to agiven tester interaction will be observed at least once afterperforming that interaction a high enough number of times,then applying i a high enough number of times will finallyreveal whether the IUT is f1 or f2: if b is eventually repliedthen it is f2 else it is f1. There are several ways to effectivelymodel this particular non-determinism view into our gen-eral framework. Perhaps the simplest and more direct oneconsists just in representing this framework as a determin-istic environment where, in particular, there exists a (single)output called “fag” as well as a (single) output called“fa; bg”. Also, we assume that our distinguishing relation 6$distinguishes output fag from output fa; bg, because therepetition of any interaction producing one of them willeventually distinguish it from the other, so in general we con-sider 6$ iff 6¼. Thus, the (deterministic) functions f1ðiÞ ¼ffagg and f2ðiÞ ¼ ffa; bgg are distinguished by input i inour framework, so fig is a complete test suite.

Let us introduce other important aspects of the frame-work given in [17] which must be considered here. Theauthor provides a testing semantics for a process algebra atthree levels: an operational semantics, a denotationalsemantics, and an axiomatic semantics. It is proved that thethree semantics are equivalent. The author considers a set ofactions A, and an algebra with the typical operators of a pro-cess algebra. Then, an operational semantics is defined aswell as a testing semantics. One of the key points is that the

testing semantics can be characterized just by using theoperational semantics. In this way, the author defines theacceptance of a process p after a trace s 2 A� as Aðp; sÞ �PðPðAÞÞ, where PðXÞ denotes the powerset of X. This setcharacterizes the non-deterministic possibilities of a processp after executing the trace s. For instance, Aðp; �Þ ¼ffag; fbgg indicates that the process can initially executeinternal actions, and then it reaches either a state whereonly a is possible, or a state where only b is possible. On theother hand, Aðq; �Þ ¼ ffa; bgg indicates that, after possiblyexecuting internal actions, the process q is ready to offer aand b, so the environment (e.g., a test) can choose bothactions. The process p can be written syntactically as a� b(internal choice), whereas q can be written as aþ b (externalchoice). In terms of tests, we say that the test T ¼ a;� ispassed by process q but not by p (p fails the test T ) underthe must testing semantics. As mentioned before, a test ispassed under these semantics if all possible computationsof the process with the test are passed. It is clear that a singleexecution of the test T and p might not fail, because the exe-cution of a is possible in p. However, if T is repeated, thenthe process p will eventually choose the other branch, andthen the test T will fail.

Next we show how the Hennessy’s framework can beexpressed in our framework and related to Class II. Herewe will assume that A is a finite set (this restriction is onlynecessary to prove our result regarding Class II). We willconsider the denotational view of processes and the musttesting semantics. Under these assumptions, a process isjust a tree where nodes are decorated with acceptance setsA � PðAÞ that have to be closed under union and convexity,and arcs are labeled by some action a 2 A. Moreover, if anode is decorated with set A, then it has a unique outgoingarc labeled with each a 2 [A2AA. Under this view, a tree tcan be seen as a function from the set of traces A� toPðPðAÞÞ where tðsÞ is the decoration of the node reachedafter processing the sequence s. We assume that function talso fulfills the following restriction: for any a 2 A ands 2 A�, tðsaÞ ¼ ? iff a 62 [A2AA, where A ¼ tðsÞ. The set ofall of these trees is denoted by fATS.

11

In order to represent Hennessy’s model into our set-ting, we consider that our set of inputs I is the set oftraces A�, and the outputs set is the set O ¼ fA j A � PðAÞ;A is closed under union and convexityg

The distinguishing relation 6$ is the trivial one. We defineour computation formalisms set as C ¼ fft jt 2 fATSg,where ftðsÞ ¼ fAg iff tðsÞ ¼ A (recall that we consider A asa single output of ft in our setting). The set E consists insome function f0 2 C denoting some tree t0 2 fATS, soE ¼ ff0g.

Next we prove ðC;EÞ 2 Class II. Initially we will con-sider the set CF ¼ fft jt 2 fATS; t finiteg,12 and we willprove ðCF [ ff0g; ff0gÞ 2 Class II. The more general case

11. We have chosen to represent the set fATS for clarity reasons inthis example, as there is only one kind of nodes in these trees. However,it would not be very hard to consider the setsATS or AT. In these cases,the definitions of the functions would be a bit more complicated,because there are two kinds of nodes (open and closed), and there aresome restrictions that apply to open nodes.

12. A tree is finite if its number of traces is finite, formallyjfs j tðsÞ 6¼ ?gj < 1.

878 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 9, SEPTEMBER 2014

Page 18: A General Testability Theory:

where f0 is also compared with functions denoting infinitetrees will be tackled afterwards. For any k 2 N, we considerthe following equivalence relation: t k t

0 iff tðsÞ ¼ t0ðsÞ forall s 2 Ak0 with k0 k.13 It is easy to check that k is indeedan equivalence relation. For any t 2 CF , the equivalenceclass induced by t for k is denoted by ½tk, and the set of all

equivalence classes for k will be denoted by CFk.Let us introduce the sets C0; C1; . . . required in the defini-

tion of Class II. The set C0 consists of one representative ofeach equivalence class of 0, though we force to choose f0as representative of its equivalence class. Since the alphabetis finite, the number of equivalence classes induced by k isfinite for all k 2 N. Let t1; . . . ; tg be arbitrary elements of

each equivalence class c 2 CF 0 with c 6 ½t00. Then, C0 ¼ff0; ft1 ; . . . ; ftgg.

Similarly, Ciþ1 is built by considering one representativeof all equivalence classes of iþ1, but forcing that the repre-sentatives already chosen in Ci are also the representatives

of their corresponding equivalence classes of CFiþ1 in the

set Ciþ1. So for any c 2 CFiþ1:

� If there is some ft 2 Ci such that t 2 c, then we pickft as representative of c in Ciþ1, that is, ft 2 Ciþ1.

� Otherwise, we choose any t 2 c and make ft 2 Ciþ1.It is clear that Ci � Ciþ1, CF ¼ S

i2NCi, and the set of

inputs Ai is a complete test suite for ðCi; EÞ. Since A is finite,

the set Ai is a finite complete test suite.Let si ¼ jCij. It is clear that limn!1 sn ¼ 1. Let n; l 2 N be

such that l > n. The set of inputs An can only distinguishelements that are in different classes of fATS

n. Let us con-sider the set T ¼ ff j f 2 Cl; f 2 ½t0ng. Then,jfðf0; f 0Þjdi f0 ; f

0;ð AnÞ; f 0 2 Clnff0ggj ¼ sl � jT j.Next we identify an upper bound of jT j. For any k � n,

let us consider the set Tkt0� Ck containing all processes that,

at least, include all traces of t0 whose length is n:

Tk;nt0

¼ ff j f 2 Ck; 8s 2 An : f0ðsÞ 6¼ ? ) fðsÞ 6¼ ?g:

Let us prove jT l;nt0j ¼ jTn;n

t0jjT j. In order to do so, we define a

bijection between T � Tn;nt0

and T l;nt0. We define b : T�

Tn;nt0

7! T l;nt0

as follows. Let ðf1; f2Þ 2 T � Tn;nt0

, and let us con-

sider the following function:

fðsÞ ¼ f2ðsÞ if jsj n fðsÞ ¼ f1ðsÞ if jsj > n:

It is easy to check that f 2 T l;nt0. We can define bðf1; f2Þ as the

unique representative of f in Cl. It is not difficult to provethat b is indeed a bijection. First, we prove that b is surjec-

tive. Let us consider f 2 T l;nt0. Let us build the function f 0

such that f 0ðsÞ ¼ f0ðsÞ for any trace s such that jsj n andf 0ðsÞ ¼ fðsÞ for any trace s such that jsj > n. Then let us con-sider f1 as the unique representative in Cl of ½f 0l. It is clearthat f1 2 T . Now let us consider f2 as the unique representa-

tive in Cn of f . Since f 2 T l;nt0, we have f2 2 Tn;n

t0. Then, by

definition bðf1; f2Þ ¼ f . Second, let us prove that b is injec-

tive. Let us consider ðf1; f2Þ; ðf 01; f

02Þ 2 T � T l;n

t0such that

bðf1; f2Þ ¼ bðf 01; f 02Þ. Since f1; f1 2 T , we have f1ðsÞ ¼ f2ðsÞ ¼f0ðsÞ for any trace s such that jsj n. By the definition of b,we obtain f1ðsÞ ¼ f 01ðsÞ for any trace n < jsj l; so½f1l ¼ ½f 01l. Since T � Cl and there is only one representa-tive per class in Cl, we obtain f1 ¼ f 0

1. By the definition of bwe obtain f2ðsÞ ¼ f2ðsÞ for any trace s such jsj n. So½f2n ¼ ½f 02n. Since f2; f

02 2 Tn;n

t0� Cn and there is only one

representative per class in Cn, we obtain f2 ¼ f 02.

Then we obtain jT j ¼ jT l;nt0

jjTn;nt0

j. Since T l;nt0

� fATSl, we infer

jT j sljTn;nt0

j. It is also clear that limn!1 jTn;nt0

j ¼ 1. So

d�rate Ak ;Cl ;E� � ¼ jfðf0 ; f 0Þjdi f0 ; f

0;Ak� �

; f 0 2 Clnff0ggsl � 1

¼ sl � jT jsl � 1

�sl � sl

jTn;nt0

j

sl � 1¼

sl

�1� 1

jTn;nt0

j

sl � 1:

Let � < 1. Let us consider the expression Sk ¼ 1� 1

jTk;kt0

j.

The expression snSksn�1 is a decreasing succession whose limit

is Sk when n tends to infinity. So, in order to make snSksn�1 > �,

it is enough to find a value of Sk such that Sk > �. Since thesuccession Sk converges to 1, there exists n0 such that

Sn0 � �. Hence, for all l � n0 we obtainslSn0sl�1 > �. We deduce

ðCF;EÞ 2 Class II.In order to prove that ðC;EÞ 2 Class II, let us consider

CI ¼ CnCF . The set CI is countable, so let us consider anumeration c0; c1; c2; . . . of CI. Now let us consider thesequence C0

i ¼ Ci [ fc0; . . . ; cig. In each C0i there are iþ 1

more elements that in Ci. At worst, those elements are notdistinguishable from f0. So

d�rate Ak ;Cl ;E� � � sl�jT j�ðlþ1Þ

sl�1�

�sl� sl

jTn;nt0

j�ðlþ1Þsl�1 �

slð1� 1jTn;nt0

jÞsl�1 � l þ 1

sl � 1 :

In this case, we also have that l þ 1sl � 1 is a succession whose

limit is 0. Hence we infer ðC;EÞ 2 Class II.

5.2 Case Study: Timeout Machines

Our second case study also tackles an elaborated (and some-how surprising) case of inclusion in Class II. Let usconsider FSMs with rational timeouts (in short, timeoutmachines hereafter). These machines work as follows. Weconsider that all states have exactly one outgoing transitionlabeled with each input and some output (thus they arefully-specified and deterministic). In addition, all states alsohave one timeout transition, which is triggered when thetime spent in that state reaches a given rational value r with0 < r M for some given rational M. The transition leadsthe machine to another state. Timeout machines are stimu-lated by means of sequences where some waiting timesstrictly alternate with actions where some input button ispressed and some output is replied (the sets of input andoutput symbols are finite). Note that, contrarily to our previ-ous examples, now the choices of the tester at each step of thestimulation of the IUT are infinite: the tester can press a but-ton (finite set of choices) or wait for any rational time withinan interval (infinite set of choices). This makes testing13. Ak is the set of all traces whose length is no longer than k.

RODR�IGUEZ ET AL.: A GENERAL TESTABILITY THEORY: CLASSES, PROPERTIES, COMPLEXITY, AND TESTING REDUCTIONS 879

Page 19: A General Testability Theory:

particularly difficult: at each step of the execution of theIUT, any finite number of tests can check only what happensin a finite number of situations, but there are infinite choiceswe could take at this step. Thus, any finite testing planchecks a 0 proportion of all cases that must be checked ateach point (a finite amount out an infinite amount whichmust be checked). Yet this case is in Class II, as we provenext.14

LetE ¼ ffg, where f represents the behavior of a timeoutmachine (actually many, as it represents all machines withthe same behavior). Letmf be theminimum timeoutmachine,i.e., the one having less states, behaving as f (note that it isunique up to isomorphism), and let it have n states. Withoutloss of generality, let us suppose that all timeouts of mf are

multiple of some 1v. Note that, even if all rational timeouts in

mf have different denominators after simplified, all of them

can be expressed as a multiple of some 1v where v is the multi-

plication of all the denominators of timeouts ofmf .Let C consist of all functions denoting timeout machines.

Let us remark that machines represented by functions in Ccan have any number of states. The set of inputs which canbe received by each function f 2 C, denoted as usual by I,is the set of sequences which strictly alternate input symbolsand rational delays as mentioned before.

Let us define the required sets C1 � C2 � � � � such thatSi2NþCi ¼ C. We define the functions included in each set

Ci as follows. Let the set Aji consist of all deterministic par-

tially-specified timeout machines with the form of treeswhere the root denotes the initial state, all branches havelength i, all non-leaf nodes denote fully specified states (i.e.,they have one timeout transition as well as one transitionlabeled with each input and some output), all leaf nodesdenote unspecified states (i.e., they have no outgoing transi-tion), and all timeout transitions are labeled with rational

numbers 0 < r M which are multiple of 1j. Hereafter this

kind of machines will be denoted as partially-specified tree-form timeout machines. We will say that a function f 0 is consis-tent with some partially-specified tree-form timeoutmachine m0 of depth k if the following condition holds: f 0

behaves as some minimum timeout machine m00 where allsequences of k consecutive transitions from the initial stateare labeled with the same inputs, outputs, and timeouts assome sequence of transitions available in m0 from its initialstate.

Similarly as in the proof of ðC2; E2Þ 2 Class II in Exam-ple 4, let us consider that the minimum function fulfilling a

given criterion H is the function fulfilling H which can berepresented with a machine (in this case, a timeoutmachine) with the minimum number of states (we assumethat ties between timeout machines with the same size are

solved in any arbitrary fix way). For each m 2 Aji , let

fm 2 C be the minimum function such that (a) it is consis-tent with m and (b) all timeouts of the minimum timeout

machine represented by fm are multiple of 1j (not only those

included by m). Then we define Ci ¼ ffg [ ffm j m 2 Ai!i g

(recall that f is the specification function). Let us checkthat

Si2NþCi ¼ C and Ci � Ciþ1 for i 2 Nþ.

First we see that Ci � Ciþ1 for all i 2 Nþ. All functions ofCi but f are constructed from the partial tree-form machines

of Ai!i . On the one hand, note that all timeout values used in

functions in Ci are allowed by Ciþ1 (we have 1i! ¼

1ði þ 1Þ! � ðiþ 1Þ, so all timeouts used in machines represented

by functions of Ci can also be expressed as multiples of1

ði þ 1Þ!). On the other hand, note that the minimum function

which is consistent with some partially-specified tree-formmachine used to build Ci will also be so for some partially-specified tree-form machine used to build Ciþ1. In particu-lar, since the set of tree-form machines considered in theconstruction of functions in Ciþ1 covers all possible ways tointeract during a ðiþ 1Þ-long interaction (where timeouts

are multiple of 1ði þ 1Þ!), one of them must coincide with the

behavior of each function which was the minimum onebeing consistent with some tree-form machine consideredin the construction of functions in Ci (where timeouts are

multiple of 1i! ¼ 1

ði þ 1Þ! � ðiþ 1Þ). Thus Ci � Ciþ1.

Now we showS

i2NþCi ¼ C. Note that, as we com-mented before when we introduced the specification func-tion f , forcing all timeouts of each machine to be multiple of

the same rational number 1i for some i 2 Nþ does not prevent

us from denoting any function in C. Moreover, all timeouts

multiple of some 1i are also multiple of 1

i!,1

ði þ 1Þ! ; . . . so all

sets Ci, Ciþ1; . . . let represent functions where all timeouts

are multiple of some 1i. Let us prove that any function f 0 2 C

is included in some Ci, either because it is the minimummachine being consistent with some of the partially-speci-fied tree-form machines considered in the construction ofCi, or because it is f (the specification). Note that f is explic-itly added to all Ci, so this case is trivial. Let us considerf 0 6¼ f . Without loss of generality, let us assume that f 0 rep-resents some timeout machine where all timeouts are multi-

ple of some 1d! with d 2 Nþ. Note that f 0 is consistent with a

single partially specified machine of Ai!i if i � d, because all

possible trees of transitions of depth i where timeouts are

multiple of 1i! are represented in Ai!

i , and exactly one of them

is consistent with f 0. Thus, if a given function f 0 2 C is notin some Ci with i � d, it is because f 0 is not the minimumfunction being consistent with the single partial machine of

Ai!i it is consistent with. Could f 0 be such that, for all i � d, it

is not the minimum function consistent with the single par-

tial machine of Ai!i it is consistent with? Let us suppose that

the minimum timeout machine behaving as f 0 has J states.Let K be the number of minimum timeout machines such

that all of their timeouts are multiple of 1d! and have J or less

14. One could consider that, since measure devices are digital andonly discrete magnitudes can be considered in practice, the difficulty ofdealing with dense time (rational times) does not appear in practice.However, in an environment where delays can take hours or nanosec-onds, considering that the set of possible times is discrete (and hence,also finite if closed intervals are considered) does not help practical test-ing because the number of times to be checked is astronomical anyway(this is similar to assuming that computers do not compute recursivelanguages but regular languages because, since the memory of a realcomputer is limited, a computer can be only in a finite set of states, so itcan be fully denoted by some finite automaton. This is technically true,but this view does not help to reason about programs or solve any prob-lem). On the contrary, if we develop a testing methodology under theassumption that the time is dense, then it will necessarily assume thatwe cannot check all times in practice.

880 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 9, SEPTEMBER 2014

Page 20: A General Testability Theory:

states (the set of them is finite because all timeouts must beat most M). Note that, by iteratively considering the func-tions added by each C1; C2; C3; � � �, the number of times theminimum function being consistent with some partially-specified tree-form machine (where all timeouts are multi-

ple of 1d!) is selected but it is not f 0, is at most K. Thus, f 0 is

eventually added to some Ci.Let Id � I be a test suite consisting of all stimulation

sequences of the form a1 � � � ad where, for all 1 k d, ak iseither an input symbol of the machine or a time delay not

greater than M and multiple of 1d. Note that any subse-

quence of a1 . . . ad consisting only of consecutive time delayscan be trivially converted into a single delay amounting allof them together. Similarly, any sequence of consecutiveinput symbols can be translated into a sequence where a 0delay is put between each pair of input symbols. Thus, anysequence in Id can be expressed as a sequence where inputsymbols and time delays strictly alternate, as required forId � I. Let us find a lower bound of the distinguishing rateof Id for each pair ðCi; ffgÞ.

By the construction of sets C1; C2; . . ., we can see the fol-lowing property. Let us consider any combination of outputs(one for each input/output transition with its correspond-ing input) and timeouts (one for each timeout transition)which could label the first d i transitions reachable in atimeout machine represented by a function of Ci. The num-ber of functions of Ci representing a machine with that com-bination of timeouts and outputs along the first d steps isthe same as the number of functions of Ci representing anyother combination of timeouts and outputs during the first dsteps. Thus, the proportion of functions of Ci having somegiven combinations of outputs and timeouts in all transi-tions traversed during the first d steps is equal to the pro-

portion of machines of the set Ai!d (i.e., the set of partial tree-

form machines of depth dwhere all timeouts are multiple of1i!) having the same combinations. Since the capability of Id

to detect the incorrectness of a function depends only on thecombination of outputs and timeouts labeling the transi-tions of the machine represented by this function during thefirst d steps, we conclude that the proportion of functionsfrom Ci which are (un-)distinguished by Id matches exactly

the proportion of partial tree-form machines of Ai!d which

are (un-)distinguished by Id.We could think that Id can distinguish f from any func-

tion f 0 2 Cd where the combination of outputs and time-outs of the machine represented by f 0 along the first dreachable transitions is not exactly the combination of f . Ifthis were true, then Id would left undistinguished onlythose functions which have exactly the same timeouts andproduce the same outputs for each input during their firstd steps. Thus, the distinguishing rate of Id for ðCi; ffgÞwith i � d would be at least one minus the proportion offunctions in Ci having the same timeouts and outputs as ffor their first d steps (which, by the property introduced inthe previous paragraph, would be one minus the propor-

tion of partial tree-form machines in Ai!d having the same

timeouts and outputs as f for their first d steps). Unfortu-nately, there are some functions f 0 2 Ci representingmachines that do not have the same combination of out-puts and timeouts during the first d steps as f , and yet Id

might not be able to identify their incorrectness. In particu-lar, when the test suite Id is applied, some alternativeassignments of outputs and timeouts to the first d transi-tions might produce a response being undistinguishablefrom that of f .

On the one hand, let us suppose that two states produceexactly the same output for each input, and one of them leadsto the other through a timeout transition. Let us suppose thatboth states are consecutively traversed by some execution ofd steps. Note that Id might not be able to discover when thetimeout from one state to the other is taken. In particular, thisis the case if the remaining steps up to the dth step do not pro-duce a different behavior from both states (which obviouslydoes not imply than longer interactions could not distinguishthem indeed). A necessary (not sufficient) condition for Id

not to be able to distinguish the behavior of both states, andthus not to discover when the timeout is taken, is that twostates produce the same outputs for all inputs and the time-out of one of them leads to the other one.

On the other hand, let us note that the time delays pro-duced in Id could be too sparse and skip some relevanttime frames where incorrect behaviors might happen, sothey could remain unobserved by Id. Two kinds of theseproblems may occur:

(a) The IUT could move to another state where thebehavior is wrong (i.e., the output replied for someinput is different to the reply defined in the specifica-tion after the same previous interaction), and stay inthat wrong state for a very short time, so the sparsetimes checked by Id skip the time frame where theIUT is at this state and do not discover it. A neces-sary condition for this is that the IUT autonomously

leaves this wrong state before 1d units of time: other-

wise, some test case in Id would produce some inputduring the stay in that state, and its wrong behaviorwould be eventually discovered. Thus, the timeout

of the wrong state must be less than 1d.

(b) The sparse times checked by Id might not be able todetect the precise time when the IUT takes some time-out, even when the responses of both states differ forat least one input. In fact, since the time delays pro-duced in Id between consecutive inputs are at least1d, they just let us to detect that each IUT timeout was

taken within some time frame of width 1d. Hence the

actual timeout could be taken at any point withinthat time frame, although only the specific timedefined by the specification for the correspondingstate is valid. An upper bound of this uncertaintyzone, where we might not know where the actual

timeout is, is the area between � 1d and 1

d time units

from the actual timeout required by the specificationat that state. Note that the correct timeout is includedin this area.

If the test suite Id is used to test ðCi; ffgÞ then, howmanypossible timeouts of machines represented by functions ofCi could fit within the uncertainty areas mentioned in (a) or

(b) (which amount a size of 1d and 2

d, respectively)? Taking

into account that Ci contains machines where all timeouts

are multiples of 1i!, an upper bound of the number of possible

RODR�IGUEZ ET AL.: A GENERAL TESTABILITY THEORY: CLASSES, PROPERTIES, COMPLEXITY, AND TESTING REDUCTIONS 881

Page 21: A General Testability Theory:

timeout values within the 1d critical area (respectively, the 2

d

critical area) is equal to 1=d1=i! (resp.

2=d1=i!). On the other hand,

since all timeouts t must fulfill the condition 0 < t M, thenumber of all possible timeouts in each state of a machine

of Ci isM1=i!. By dividing the two previous expressions, we

infer that an upper bound of the proportion of available time-outs which fall into the critical area mentioned in (a) is1=dM ¼ 1

dM, and the proportion in the critical area mentioned

in (b) is 2=dM ¼ 2

dM.

Let us suppose that some IUT timeout is deviated from thetimeout actually required by the specification for the corre-sponding state. If the deviated IUT timeout is out of the criti-cal areas considered in (a) and (b), then it will produce abehavior which will be able to be detected as incorrect by Id

provided that the state reached after the timeout replies a dif-ferent output for at least one input. On the one hand, if the

timeoutmentioned in (a) is higher than 1d then at least one test

case of Id will observe the IUT behavior at the new state.Thus, if the reply of that state to some input is different fromthe reply of the previous state, it will be detected. On the

other hand, if the timeout mentioned in (b) is further than 1d

units from the timeout required by the specification, then thisexcessive deviation will be discovered by at least one testcase of Id, provided that the reply before and after the time-out is taken is different for some input. We conclude that, ifthere is a wrong timeout but the reply of the machine beforeand after the timeout is different for at least one input, thenthis fault can remain undetected by Id only if the timeout iswithin the critical areasmentioned in (a) or (b), which account

a proportion of 1dM and 2

dM respectively.

All in all, we conclude that a necessary (but not suffi-cient) condition for Id not being able to detect a fault in anincorrect function f 0 2 Ci is that, for all state s of the timeoutmachine represented by f 0 such that s is reachable by somesequence in Id, either (i) its timeout is within the critical

areas (a) or (b) (amounting a total proportion of 3dM); or

(ii) the state s0 reached by the timeout transition leaving sproduces, for all inputs, the same outputs as s. Note that, iffor some state s reachable by Id neither (i) nor (ii) hold, thenthe timeout of s is produced at a time such that Id observeswhat happens before and after it for at least one point ateach side (due to (i) not holding), and the behavior of bothconsecutive states is different for at least one input (due to(ii) not holding), so Id discovers that a timeout is taken atan unexpected time and hence a fault is detected. It is worthto mention that, if the timeout machine represented by f 0 issuch that all transitions reachable in d steps are labeled withthe same outputs and timeouts as in the timeout representedby f , then f 0 also fulfills the condition stated at the begin-ning of this paragraph, because the correct timeout value isalso within the critical area considered in (b) (note that suchf 0 could be incorrect or not). We conclude that a functionf 0 2 Ci can be incorrect and not be detected as incorrectonly if (i) and (ii) hold for all the states reachable by Id inthe timeout machine represented by f 0.

Thus, a lower bound of the distinguishing rate of Id forðCi; ffgÞ is given by the proportion of functions of Ci whereat least one state reached by Id is such that either its timeout

is out of these critical areas, or the response of that state forsome input is different from the response of the state reachedafter the timeout. In order to calculate this, let us count theproportion of functions of Ci where all timeouts taken dur-ing the execution of all sequences of Id are within the criticalareas or lead to a state where all inputs are replied with thesame outputs as in the previous state. Let N be the numberof input symbols and L be the number of output symbols(we assume N;L � 2).The proportion of functions in Ci

where a given state s produces, for all inputs, the same out-puts as the state it reaches through its timeout transition, is1

LN . We deduce that the proportion of functions where, for a

given state s reached by Id, either its timeout is within thecritical areas mentioned before or for all inputs it producesthe same outputs as the state reached by its timeout transi-

tion is 3dM þ 1

LN � 3dM � 1

LN .

Let us consider i � d. In order to compute the proportionof functions of Ci which fulfill some property concerningonly their first d steps, we just have to calculate the propor-tion of trees of length d which fulfill the considered prop-erty. As mentioned earlier, the reason is that the number offunctions in Ci being exactly as each of these d-depth treesduring their first d steps is the same for all of these trees, soa proportion calculated for the set of all partially-specifiedtree-form machines of depth d steps is the same for the setof all partially-specified tree-form machines of depth i � d.

Taking all the previous considerations into account, wecan compute a lower bound of the distinguishing rate of Id

forCi with i � d as follows.Wewill calculate an upper boundof the proportion of incorrect functions from Ci which mightnot produce any incorrect response when all sequences in Id

are given. Let vðkÞ denote the proportion of partial tree-formmachines of depth kwhere, for all state s of the tree, either

� there is some previous input/output transition alongits branch which departs from a state whose timeoutis less that 1

d (note that this input/output transitionmight not be taken by Id because all delays between

inputs are at least 1d, so Id could never reach s), or

� if not (so s can be reached indeed) then either its

timeout is within the critical 3dM area or its responses

for all inputs are exactly the same as in the previousstate of the branch.

Note that, if i � d, then vðdÞ provides an upper bound ofthe proportion of incorrect functions in Ci which cannot bedistinguished from f when all test cases in Id are applied,so we have d�rate Id ;Ci; ff gð Þ � 1� vðdÞ.

Let us recursively define vðkÞ. Let vð1Þ ¼ 3dM þ 1

LN �3

dM � 1LN . Regarding the recursive case, let

vðkþ 1Þ ¼ 1dM � vðkÞ

þ dM�1dM � 2=dM

ðdM�1Þ=dM þ 1LN � 2=dM

ðdM�1Þ=dM � 1LN

� �� vðkÞNþ1:

Let us explain the previous expression, which denotes anupper bound of the proportion of trees of depth kþ 1whichare incorrect but could not be detected as such by Id. Let sbe the state at the root of the tree.

(a) The first line of the expression denotes the propor-tion of these trees where, in particular, the timeout of

882 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 9, SEPTEMBER 2014

Page 22: A General Testability Theory:

s is less than 1d. In this case, the proportion of trees

where this happens ( 1dM) is multiplied by the propor-

tion of all cases where the subtree reached from sthrough its timeout also fulfills the property (vðkÞ).Note that the rest of subtrees reached from s in a sin-gle transition are reachable through input/outputtransitions, but these transitions could not be takenby Id in the worst case (as the timeout of s is too smallfor Id), so these subtrees are not required to fulfillthe property (recall that we are computing an upperbound of the proportion of incorrect but undetectedfunctions). On the contrary, the timeout transition ofs certainly can be taken, so we must include the vðkÞterm indeed.

(b) The second line denotes the proportion of treeswhere, in particular, the timeout of s is higher than1d. In this case, the proportion of trees like this (dM�1

dM )

is multiplied by the proportion of all cases wheresome source of potentially undetected faults hap-pens at s. This proportion is the addition of the pro-

portion of timeouts which are at most 1d further from

the required timeout at this point, conditioned to

the fact that the timeout is not less than 1dð 2=dM

ðdM�1Þ=dMÞ,plus the proportion of trees where s produces, forall inputs, the same outputs as the state reached by

the timeout transition of s ( 1LN), minus the propor-

tion of trees fulfilling both conditions (to avoidcounting twice). Note that all subtrees reachablefrom s through a transition (either input/output ortimeout) are required to fulfill the property in thiscase, because now Id certainly can reach all of them

(as the timeout is higher than 1d). Thus the previous

amount is multiplied by the proportion of caseswhere all of these subtrees fulfill the property

(vðkÞNþ1).Let p ¼ 1

dM and q ¼ 3dM þ 1

m. Note that we have0 vðkÞ 1 for all k � 1 because vðkÞ is a proportion. Thuswe have

vðkþ 1Þ ¼ p � vðkÞ

þ ð1� pÞ � 2p

1� pþ 1

LN� 2p

1� p� 1

LN

� �� vðkÞNþ1

p � vðkÞ þ ð1� pÞ � 2p

1� pþ 1

LN

� �� vðkÞNþ1

¼ p � vðkÞ þ 2pþ 1� p

LN

� �� vðkÞNþ1

p � vðkÞ þ 2pþ 1

LN

� �� vðkÞNþ1

q � vðkÞ þ q � vðkÞNþ1

q � vðkÞ þ q � vðkÞ ¼ 2q � vðkÞ:

Let us define v0ð1Þ ¼ 2q and v0ðkþ 1Þ ¼ 2q � v0ðkÞ. It isclear that v0ðkÞ � vðkÞ for all k � 1. Moreover, by induction

over k, it is trivial to prove that v0ðkÞ ¼ 2kqk. Hence,

d�rate Id ;Ci; ff gð Þ � 1� vðdÞ � 1� v0ðdÞ¼ 1� 2dqd ¼ 1� 2d 3

dM þ 1LN

� �d

:

This lower bound of d�rate Id;Ci; ff gð Þ is strictlyincreasing with d, and it tends to 1 as d tends to infinity

(note that 2dð 3dM þ 1

LNÞd tends to 0 as d increases because

LN � 4). Moreover, let us remark that this lower bound isvalid for all i � d, so it does not depend on i provided thati � d. Thus, for any � < 1 there exists some d such thatd�rate Id ;Ci; ff gð Þ > � for all i � d. We concludeðC;EÞ 2 Class II.

5.3 Case Study: Finite State Machines, TuringMachines and Finite Automata

We illustrate F with examples. The first examples, dealingwith FSM variants, will lie in manipulating finite computa-tion formalisms and mapping computation formalisms intoeach other. Next, a more interesting example where compu-tation formalism are infinite, dealing with Turing machinesand finite automata, will be given to illustrate a case wheremapping computation formalisms to each other is impossi-ble, but there still exists a testing reduction because we canjust map the border between correctness and incorrectness.

Let us compare FSMs, extended finite state machines(EFSMs) and temporal EFSMs (TEFSMs) in terms of testing.EFSMs are FSMs where the behavior at each state is condi-tioned by the current values of some variables; the set ofvariables is finite. Each transition is enabled/disabled by aBoolean condition over current variable values, and valuesare updated after taking each transition. We consider thatthe set of possible values for each variable is finite. On theother hand, TEFSMs are EFSMs where the time at whichactions occur is considered. Stimulating a TEFSM impliesproducing some inputs at some times. Variables are alsoused by TEFSMs, but some of them may denote clocks, thatis, they denote the time elapsed since some previous transi-tion was taken.15 We will assume that the time is discreteand only executions up to a given number of time units areconsidered.

Let CFSMklm represent the set of all deterministic FSMs hav-

ing at most k 2 N states, l 2 N inputs, and m 2 N outputs(that is, following the notation used in previous examples,

we have jI 0j ¼ l and jO0j ¼ m). Besides, let TFSM ¼ fðCFSMklm ;

ffgÞjk; l;m 2 N; f 2 CFSMklm g. Similarly, let CEFSM

klmpq represent

the set of all deterministic EFSMs with the same meaning ofk; l;m as before and having at most p 2 N variables, whereeach variable can take up to q 2 N different values, and let

TEFSM ¼ fðCEFSMklmpq ; ffgÞ j k; l;m; p; q 2 N; f 2 CEFSM

klmpq g. Finally,let CTEFSM

klmpqrs represent the set of all deterministic TEFSMs

with the same meaning of k; l;m; p; q as before, where thefirst r variables denote clocks and executions can take up to

s time units, and let TTEFSM ¼ fðCTEFSMklmpqrs; ffgÞ j k; l;m; p; q; r;

s 2 N; f 2 CTEFSMklmpqrsg. Let us compare testing problems TFSM ,

TEFSM , and TTEFSM .FSMs can be seen as EFSMs where all transition guards

are enabled. Thus, it will be easy to check TFSM F TEFSM .

Given a pair a ¼ ðCFSMklm ; ffgÞ 2 TFSM , let us consider the

pair b ¼ ðCEFSMklm00 ; ffgÞ 2 TEFSM . Let us note that functions in

CFSMklm are the same functions as those in CEFSM

klm00 , that is,

15. TEFSMs can be thought as a kind of timed automata.

RODR�IGUEZ ET AL.: A GENERAL TESTABILITY THEORY: CLASSES, PROPERTIES, COMPLEXITY, AND TESTING REDUCTIONS 883

Page 23: A General Testability Theory:

CFSMklm ¼ CEFSM

klm00 , so a ¼ b and any complete test suite for b is

also a complete test suite for a. Thus, functions e and trequired by Definition 12 to transform pairs from TFSM intoTEFSM and transform test suites for TEFSM into test suites for

TFSM , respectively, can be defined such that eððCFSMklm ; ffgÞÞ

¼ ðCEFSMklm00 ; ffgÞ and t is the identity function.16 Given

a 2 TFSM , if I is a complete test suite for eðaÞ then tðIÞ is acomplete test suite for a, so TFSM F TEFSM . On the otherhand, EFSMs can be seen as TEFSMs where no variable rep-resents a clock. Let us note that, contrarily to FSMs orEFSMs, stimulating a TEFSM does not consist in producinga sequence of inputs (buttons), but it consists in producing asequence of inputs at some specific times, that is, a test for aTEFSM is a sequence of pairs ði; dÞ where i is an input and dis the time when i is produced. Let us consider a function e

such that eððCEFSMklmpq ; ffgÞÞ ¼ ðCTEFSM

klmpq00; ff 0gÞwith f 0ðði1; d1Þ � � �ðin; dnÞÞ ¼ fði1 � � � inÞ. Given a 2 TEFSM , a test suite for eðaÞconsists in a set of sequences of pairs ði; dÞ of inputs and

times. However, since eðaÞ follows the form ðCTEFSMklmpq00; ff 0gÞ,

times in these sequences are irrelevant. Given a completetest suite for eðaÞ, it is straightforward to see that the set ofsequences of inputs of that suite without the times is com-plete for a. Thus, we can define t as follows: tðIÞ ¼fði1 � � � inÞ j ðði1; d1Þ � � � ðin; dnÞÞ 2 Ig. By considering thesedefinitions of e and t, the requirement imposed by Defini-tion 12 is fulfilled: If I is a complete test suite for eðaÞ thentðIÞ is a complete test suite for a. Thus we concludeTEFSM F TTEFSM . By the transitivity of F , we also haveTFSM F TTEFSM . Thus, the problem of completely testing(i.e., testing up to completeness) FSMs can be easily trans-formed into the problem of completely testing TEFSMs, asone would expect.

More interestingly, we also have TEFSM F TFSM andTTEFSM F TEFSM . Let us note that any EFSM where thenumber of variables is finite, and variables can take a finitenumber of values, can be unfolded into an equivalent FSMwhere each state represents a combination of EFSM state

and variable values. Let eððCEFSMklmpq ; ffgÞÞ ¼ ðCFSM

ðk�p�qÞ lm; ffgÞ.Given a 2 TEFSM , a complete test suite for eðaÞ is a complete

test suite for a (note that CEFSMklmpq ¼ CFSM

ðk�p�qÞ lm, so eðaÞ ¼ a).

Thus, if t is the identity function then e and t fulfill therequirement of Definition 12 and we have TEFSM F TFSM .

On the other hand, let us note that a TEFSM where exe-cutions are constrained to take at most s time units can besimulated by an EFSM. We can transform the TEFSM asfollows. Each transition labeled by i in the TEFSM is trans-formed into up to s transitions in the EFSM, one for eachpair ði; dÞ where 0 d s represents a time. Actually,each ði; dÞ is considered to be a single EFSM input. Due tothe s limit, the number of EFSM inputs and transitions isfinite. Besides, each transition labeled by ði; dÞ explicitlyupdates the value of all EFSM variables representing the

TEFSM clocks according to d. Let eððCTEFSMklmpqrs; ffgÞÞ ¼

ðCEFSMk ðl�sÞ mp ðmaxðq;sÞÞ; ffgÞ, and let t map TEFSM sequences of

inputs and times in such a way that each pair ði; dÞ in asequence is transformed into the (single) input

representing that pair ði; dÞ in the EFSM. Given

a 2 CTEFSMklmpqrs, by construction we have that, if I is a com-

plete test suite for eðaÞ, then tðIÞ is a complete test suitefor a, so we conclude TTEFSM F TEFSM . By the transitivityof F , we have TTEFSM F TFSM , that is, the problem offinding a complete test suite for a bounded TEFSM can betransformed into the problem of finding a complete testsuite for a FSM having less than a given number of states.

As we saw in Example 6, for all a ¼ ðCFSMklm ; ffgÞ 2 TFSM

we have a 2 Class I, so TFSM � Class I. By Theorem 3 (c)we conclude TEFSM � Class I and TTEFSM � Class I.

The previous examples of this case study lie in mappingcomputation formalism, which was possible in particulardue to the finiteness of all involved computation formal-isms. Next, an example of reduction where a mappingbetween computation formalisms is impossible (becausethey trivially have different computing power) is presented.Moreover, despite both computation formalisms are infi-nite, we will see that they lie in Class I. Other kinds ofexamples involving infinite computation formalisms thatcannot be transformed into each other will be considered inCase Study 5.4.

Let CTM represent all deterministic terminating TuringMachines from f0; 1g� to fyes; nog, and CFA represent alldeterministic finite automata with the same type.

Let TTM consist of all pairs where we have to distinguisha Turing Machine M from all Turing Machines answeringas M for at most k 2 N inputs. Formally, let TTM be defined

as follows: TTM ¼ fðffg [ Cf;kTM; ffgÞ j f 2 CTM; k 2 Ng

where f 0 2 Cf;kTM iff f 0 gives the same answer as f for at most

kwords.Similarly, let TFA ¼ fðffg [ Cf;k

FA; ffgÞ j f 2 CFA; k 2 Ngwhere f 2 Cf;k

FA iff f 0 gives the same answer as f for at most

k words. Let us note that any pair in TTM or TFA is uniquelycharacterized by f and k.

Next we will show TTM F TFA. In particular, func-tions e and t considered in Definition 12 can be definedas follows. First, e : TTM �! TFA is such that eððffg [Cf;k

TM; ffgÞÞ ¼ ðff 0g [ Cf 0;kFA ; ff 0gÞ, where f 0 is any function

in CFA (say, the one answering yes for all inputs), and

t : 2f0;1g� �! 2f0;1g

�is the identity function. Let p ¼ ðffg[

Cf;kTM; ffgÞ 2 TTM . We have that I is a complete test suite

for eðpÞ ¼ ðff 0g [ Cf 0;kFA ; ff 0gÞ only if I consists of (any)

kþ 1 or more different inputs. If it is so then tðIÞ ¼ I is

also complete for p ¼ ðffg [ Cf;kTM; ffgÞ 2 TTM . So, TTM

FTFA. For all q ¼ ðffg [ Cf;kFA; ffgÞ 2 TFA, we have that

any set of kþ 1 2 N inputs is a complete test suite for q,so TFA � Class I. By Theorem 3 (c) we concludeTTM � Class I. By using similar arguments we haveTFA F TTM .

5.4 Case Study: Increasing Continuous MagnitudeMachines

In this case study, we introduce a model of machine thatgeneralizes (a kind of) temporal systems, and we use testingreductions to find ways to test several variants of thismodel. Actually, some reductions will show interestingmethods to test them and some (not so expected) inclusion16. Let us note that e is also the identity function.

884 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 9, SEPTEMBER 2014

Page 24: A General Testability Theory:

in Class I (in particular, that of testing problem TV ). Inaddition, we will use these notions to briefly discuss thepractical applicability of the proposed methods to test a carcontrol device.

Let Fk;I;O consist of all deterministic FSMs with no morethat k 2 N states, the finite set of inputs I, and the finite setof inputs O. Let TF ¼ fðFk;I;O; ffgÞ j k 2 N ^ I; O are finitesets ^ f 2 Fk;I;Og. We already know that all pairs in TF

belong to Class I (see Example 3).Let us introduce a new state machine which we will call

Increasing Continuous Magnitude machine (ICM hereafter).This deterministic machine has a finite number of states andtransitions as a standard FSM, though it has some new ele-ments. In addition to standard inputs, it also receives a(never decreasing) continuous signal. For instance, thismagnitude can denote the number of kilometers ran by acar, the number of gas liters consumed by the car, or eventhe elapsed time. A magnitude increment counter accounts theincrement of the magnitude since it was reset to 0 for thelast time. There are three kinds of transitions in an ICM.First, we have the standard FSM transitions labeled by aninput/output pair (called standard i/o transitions). Second,we have transitions like the former ones, though theyalso reset the magnitude counter to 0 when triggered (reset-ting i/o transitions). Finally, we also have transitions whichare taken when the magnitude increment counter reachessome value r 2 Rþ specified in the transition. After the tran-sition is (silently) taken, the counter is also reset to 0. Thesetransitions will be called triggering transitions. We willassume that all states have exactly one outgoing triggeringtransition (note that we can still represent a state that cannotbe abandoned without receiving any input by making itsmandatory triggering transition lead to the same state). Aninteraction with an ICM consists in a sequence where stan-dard inputs from a finite set I can interleave with actions ofthe form “increase the magnitude by x units” with x 2 Rþ,in any order. The ICM replies to inputs with outputs,whereas magnitude increments produce no visible answer.

Let Wk;I;O be the set of all functions which denote thebehavior of some ICM with at most k states and sets ofinputs and outputs I and O, respectively. Let f 2 Wk;I;O. It iseasy to see that ðWk;I;O; ffgÞ 62 Class I. Let us suppose thatwe have observed the reaction of the IUT in these two sce-narios: (a) when an input i is provided after increasing themagnitude by u1 units; and (b) when it is provided afterincreasing it by u2 units with u2 > u1. We know nothingabout what would happen if i were given after u units withu2 > u > u1. In particular, the ICM could silently move toanother state just before u, and then again just after u. Thus,similarly as in the pair ðC3; E3Þ viewed in Example 3, allmagnitude increments must be checked to get complete-ness. Hence ðWk;I;O; ffgÞ 62 Class I.

However, we can enable finite completeness in manyalternative related scenarios. For instance, let us supposethat we can see triggering transitions when they happen.Then, we can check the value of the magnitude counter atthat exact moment when they are taken. Let us assume thatwe extend the original ICM set of inputs I with a new inputcalled ‘increase,’ meaning “increase the magnitude until a trig-gering transition is taken.” Also, the original ICM set of

outputs O is extended with a finite set of outputs ‘trigger atx’ with x 2 Rþ, meaning “a trigger transition was taken atmagnitude counter value x.” Let I 0 and O0 be the resultingextended sets (note that O0 is infinite). Let W 0

k;I0;O0 denote the

set of ICM variants modified like this and having the(extended) set of inputs I 0, the (extended) set of outputs O0,and having at most k states, and let TW 0 ¼ fðW 0

k;I0;O0 ;

ffgÞ j k 2 N ^ I 0; O0 are the described extensions for somefinite sets I; O ^ f 2 W 0

k;I0;O0 g. We have TW 0 F TF . In order

to make this reduction, we just convert any pairðW 0

k;I0;O0 ; ffgÞ 2 TW 0 into a pair ðFk;I0;O00 ; ff 0gÞ 2 TF . In that

expression, f 0 represents an FSM where, contrarily to f ,interactions consist in sequences of only I 0 inputs (note thatf also reacts to sequences where I 0 inputs may interleave inany order with ICM actions of the form “increase the magni-tude by x units” with x 2 Rþ, which do not exist in f 0).Besides, the set of outputs O00 consists of all outputs of O,together with all outputs of the form ‘trigger at x’ where x isthe label of some triggering transition of f , together with anadditional output ‘trigger at unknown value.’ All outputs ofthe form ‘trigger at x’ given by functions in W 0

k;I0;O0 are trans-

lated into the ‘trigger at unknown value’ output if x is not atriggering value of f . Note that, contrarily toO0,O00 is a finiteset. We can see that any complete test suite for ðFk;I0;O00 ; ff 0gÞis a complete test suite for ðW 0

k;I0;O0 ; ffgÞ, because testing

ðFk;I0;O00 ; ff 0gÞ up to completeness implies fully controlling

the reactions to magnitude increments in ðW 0k;I0;O0 ; ffgÞ. So,

TW 0 F TF . Since TF � Class I, we have TW 0 � Class I.Finite completeness can also be enabled if we assume

that we know the finite set J of values x 2 Rþ that label trig-gering transitions in the IUT (or we know any finite supersetof J). Let W 00

k;I;O;J � Wk;I;O denote all ICMs with at most k

states, set of inputs I, and set of outputs O, where the set ofvalues labeling all triggering transitions is a subset of J . LetTW 00 ¼ fðW 00

k;I;O;J ; ffgÞ j k 2 N ^ I; O are finite sets ^ all

values in triggering transitions belong to J ^ f 2 W 00k;I;O;Jg.

We have TW 00 F TF . In particular, we can convert any pairðW 00

k;I;O;J ; ffgÞ 2 TW 00 into a pair ðFk0;I0;O; ff 0gÞ 2 TF where

� I 0 consists of all inputs of I together with a set of newinputs ‘increase magnitude by x’ for all x 2 J (note thatJ is finite, so I 0 remains finite as well);

� f 0 represents an FSM that behaves (partially) as theICM represented by f . In particular, FSM states arepairs ðs;mÞ where s is a state of the ICM and m is amagnitude value which must be the addition of 0 ormore values of J and must not be higher than max(where max is the highest value of J). In addition,maxþ 1 (meaning ‘any value higher than m’) is alsoan allowed value for m. Note that the set of allowedvalues of m is finite, so the set of states of the FSM isso as well. Transitions for each FSM state ðs;mÞare defined to simulate the corresponding ICM: schanges as defined by the corresponding standardor resetting i/o transition, and m is increased in‘increase magnitude by x’ inputs (with x 2 J) by xunits as long as mþ x max, else it is set tomaxþ 1. Besides, m turns to 0 in FSM transitions

RODR�IGUEZ ET AL.: A GENERAL TESTABILITY THEORY: CLASSES, PROPERTIES, COMPLEXITY, AND TESTING REDUCTIONS 885

Page 25: A General Testability Theory:

which denote an ICM resetting i/o or triggeringtransition; and

� k0 equals k (the number of allowed states in ICMsrepresented by W 00

k;I;O;J ) multiplied by the number of

possible values ofm according to the previous expla-nation. Note that any ICM represented by W 00

k;I;O;J

can be converted, according to the aforementionedconstruction, into an FSM with k0 states.

Since all values labeling triggering transitions in the ICMrepresenting the IUT belong to J (by hypothesis), we havethat any complete test suite for ðFk0;I0;O; ff 0gÞ is also a com-

plete test suite for ðW 00k;I;O;J ; ffgÞ (any complete test suite for

ðFk0;I0;O; ff 0gÞ must check all FSM transitions, which by the

way also implies checking all reactions of the original ICMto magnitude increments). We infer TW 00 F TF .

We modify the previous case to denote a similar scenario.Rather than the set J , we could just know the greatest com-mon divisorgcd of all values in J , which makes sense if allreal values in J turn out to be also naturals, or at least ratio-nal numbers. In addition, let us suppose that we also knowan upper boundmax on the highest value labeling a trigger-ing transition in the IUT. Let TW 000 be this new testing prob-lem. We can perform the same reduction into TF as in theprevious paragraph, though now ICMs are translated intoFSMs where a single ‘increase magnitude by x’ new input isadded, where x ¼ gcd. States have also the form ðs;mÞ,though now m can take any multiple of gcd from 0 to thelowest multiple being higher than max. We inferTW 000 F TF , so TW 000 is also included in Class I. Note thatthis case is very similar to assuming that magnitudes cantake only natural numbers and we know an upper bound ofthe maximum label of triggering transitions. In addition, letus consider the following problem variant. Let TW 0000 � TW 00be the same problem as the problem TW 00 we saw in the pre-vious paragraph, though now all values in J are rationalnumbers. After proving TW 000 � Class I like proposed inthis paragraph, proving that TW 0000 is included in Class I ismuch easier than doing it by proving TW 00 � Class I. Sinceall values of J are assumed to be rational values in TW 0000 ,there exists a greatest common divisor for all of them, so thereduction TW 0000 F TW 000 is trivial. Since TW 000 � Class I, wededuce TW 0000 � Class I.

Let us introduce a more interesting scenario that (some-how unexpectedly) enables finite completeness. Note that,even if the conditions required by TW 000 apply, the size of thecomplete test suite dramatically increases as the greatest com-mon divisor decreases. For instance, if two ICM triggeringtransitions trigger at values 4703:2 and 4703:25, then all mag-nitude increments of 0:05 units up to (at least) 4703:25 have tobe checked at each ICM state in order to get completeness.Nextwe present a scenariowhere this problem is avoided.

Let us discard all previously presented hypotheses andassume the following ones:

(a) All values labeling triggering transitions are higherthan some given l 2 Rþ.

(b) For some given r 2 Rþ with 2r < l, any wrong out-put happening within less than r magnitude unitsaway from another magnitude value where thatoutput can be produced by the specification ICMwill not be considered as a fault. In order to forbid

cumulative small deviations where none individu-ally exceeds the limit, total magnitude incrementsfrom the execution beginning are considered. Forinstance, let us suppose that the sequence “increase5.3 units; give i1 and receive o1; increase 7.9 units; givei2 and receive o2” cannot be produced by the specifi-cation ICM, but the sequence “increase 5.7 units; givei1 and receive o1; increase 7.2 units; give i2 and receiveo2” can. If we set r ¼ 0:5, then the former sequenceis not considered as a fault, because the first outputhappens within the 0:5 units margin from 5:7, andthe second one happens within the 0:5 units marginfrom 5:7þ 7:2 ¼ 12:9 (it happens at value5:3þ 7:9 ¼ 13:2). However, if magnitude 7:9 isreplaced by 7:0 in the former sequence, then theresulting sequence is considered wrong, becausenow o2 happens 0:6 units away from its allowedvalue (5:3þ 7:0 ¼ 12:3).

Let Vk;I;O;l be the set of all functions denoting the behaviorof some ICM with at most k states and sets of inputs andoutputs I and O, respectively, where the values of all trig-gering transitions are higher than l. Let f 2 Vk;I;O;l andVf;r � Vk;I;O;l be the subset of all functions in Vk;I;O;l whichare correct with respect to f according the criterion (b).Let TV ¼ fðVk;I;O;l; Vf;rÞ j k 2 N ^ l; r 2 Rþ ^ r < l ^ f 2Vk;I;O;l ^ I; O are finite sets ^ Vf;r � Vk;I;O;l contains all func-tions being correct with respect to f according to (b)g. Let usshow that we have TV F TF , which by the way will proveTV � Class I. We can transform each pairðVk;I;O;l; Vf;rÞ 2 TV into a pair ðFk0;I0;O; ff 0gÞ 2 TF where

� I 0 consists of all inputs of I together with the follow-ing new input: ‘raise magnitude’;

� f 0 is a function representing an FSM behaving as fol-lows. Its states are pairs ðs;mÞ where s is a state ofthe ICM and m is a magnitude value which can beany multiple of l from 0 to the highest multiple of lthat is lower than max, where max is the maximumvalue labeling a triggering transition in the specifica-tion machine. Moreover, m can also be r and x� r

for all x 2 Rþ labeling a triggering transition of thespecification machine. Note that s and m can take afinite set of values, so the resulting machine has afinite number of states indeed. Transitions betweenstates ðs;mÞ are defined to simulate the state andmagnitude changes in ICM transitions. Let us sup-pose we are at the FSM state ðs;mÞ. Taking ICM stan-dard i/o transitions modifies s according to thetransition and leaves m unmodified, whereas ICMresetting i/o transitions modify s as specified by thetransition and also set m to 0. ICM triggering transi-tions are discarded and replaced in the FSM by newtransitions triggered by a new input in I 0 called ‘raisemagnitude’. The input ‘raise magnitude’ simulates, inthe FSM, some (partial or total) increase of the mag-nitude towards the lowest or the highest boundallowed by the (unique) ICM triggering transitionvalue at s. Let x be the triggering value at s, and letus assume that the ICM triggering transition at sleads to s0. The FSM input ‘raise magnitude’ has thefollowing effect:

886 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 9, SEPTEMBER 2014

Page 26: A General Testability Theory:

(i) If the current magnitude value m is the highestmultiple of l being less than x� r, then the mag-nitude is raised to reach x� r. This means thatthe new state reached in the FSM is ðs; x� rÞ.

(ii) Else, if the current magnitude value m equalsx� r, then the magnitude is raised to reach xþ r(note that this increment is lower than l becausewe required 2r < l). This means that the newstate reached in the FSM is ðs0; rÞ (note that it isnot ðs0; 0Þ).

(iii) Else it simulates that the magnitude is increasedup to the next multiple of l. That is, the new statereached in the FSM is ðs;m0Þ where m0 is thesmallest multiple of l that is higher thanm.

� k0 ¼ k � ðdmaxl e þ kþ 1Þ. That is, k0 equals k (the num-

ber of allowed states in ICMs represented by Vk;I;O;l)multiplied by an upper bound of the number of pos-sible values of m as mentioned above. Note that anyICM represented by Vk;I;O;l can be converted, accord-ing to the aforementioned construction, into an FSMwith k0 states.

Let us note that any complete test suite for ðFk0;I0;O; ff 0gÞmust also be a complete test suite for ðVk;I;O;l; Vf;rÞ after all‘raise magnitude’ FSM inputs appearing in the complete testsuite for FSMs are replaced by the corresponding magni-tude increments they represent in each case according tocases (i), (ii), and (iii) presented above. That is, if a test suitecan guarantee that an FSM behaves as the FSM resultingfrom the previous conversion, then the original ICM is cor-rect as well. On the hand, by using only þl (or less) magni-tude increments, there is no risk of missing any triggeringtransition from the IUT: Since all triggering transitions trig-ger at increments strictly higher than l, no more than onetriggering transition may happen between two consecutivemultiples of l. Thus, there is no risk to leave any IUT stateunchecked. On the other hand, by checking triggering valuesjust r units before and after the required triggering value,we can check whether the behavior at both points corre-sponds to the behavior of the ICM at the state we are sup-posed to be at just before the triggering transition and theexpected state just after it. If both behaviors are as expected,then the triggering must have happened between both val-ues, as required by condition (b) introduced earlier. We con-clude that the proposed reduction enables TV F TF , so wehave TV � Class I.

Let us suppose that wewant to extend ICMs tomake themdepend on two independent magnitudes. Both magnitudeshave their own counter, both can be independently reset intransitions, and both can launch their own independent trig-gering transitions when their counters reach some specificvalues. Also, let us assume that both may have their ownindependent l and r constants as defined in the previous case.The problem of testing ICMs with two magnitudes underthese assumptions (say TV 2 ) can be reduced to the problem oftesting ICMs as considered in the previous case (TV ) by usingalmost the same construction as in the reduction TV F TF

seen before. In particular, we get rid of one magnitude of the2-magnitude ICMby codifying it into the FSMpart (i.e., statesand transitions) of the 1-magnitude ICM (again, we use the‘raise magnitude’ additional input for simulating increases on

the removed magnitude). We conclude TV 2 F TV . Let TV n

be the problem of testing ICMs with n independent magni-tudes. Similarly, it can be reduced to TV n�1 . Since we alsohave TV F TF and TF � Class I, by induction and the tran-sitivity of F we trivially infer that TV n � Class I for alln 2 N. In order to get the actual complete test suite derivationalgorithm for TV n instances, we just have to compose all thereductions from each problem to the subsequent one, nextapply a derivation algorithm for FSMs (e.g., test all sequenceswith 2nþ 1 inputs, where n is the number of states), and nextiteratively translate complete test suites back up to TV n . Inparticular, we convert all ‘raise magnitude’ inputs, used ateach magnitude hiding, into the true magnitude incrementscorresponding at each state ðs;mÞ.

Let us discuss some practical implications of the previoustesting methodology. First, note that the time can be consid-ered as any other magnitude of an ICM. In order to increasethis magnitude, we just have to let the time pass by. Second,let us address the following question: Can we use that previ-ous testing methodology for TV n , inferred by consecutivelyapplying reductions, to test ICMs where magnitudes are notindependent? Let us illustrate this issue with the followingexample.Wewant to test a car control device which dependson the increments of the number of traversed kilometers, thenumber of consumed gasoline liters, and the amount ofelapsed time. We may argue that, when the device isdeployed in an actual car, we could hardly observe kilometerincrements without associated increments in time or in theconsumed gasoline. Let us suppose that we can test thedevice before it is deployed in a car. In the laboratory, we arefree to stimulate all IUT sensors (time, kilometers, gas con-sumption) independently from each other. Let us supposethat we apply the test suite derivation algorithm mentionedin the previous paragraph (in particular, the one for three-magnitude ICMs) to test this IUT, and the IUT passes it. Sincethe derived test suite is complete, it means that the IUT is cor-rect according to our criteria (a) and (b) mentioned earlier.Moreover, since the test suite discards all possible wrongIUTs according to these criteria, in particular it also discardsall possible wrong IUTs where magnitudes increments areconstrained to those that could really happen inside a car.For instance, interactions where every 1 kilometer we add 1minute and 0:05 liters are also possible in an independent-magnitude setting, so they are also required to be right by acomplete test suite. Thus, any possible wrong behavior,unleashed by a realistic after-deployment magnitude incre-ment pattern, is also guaranteed not to happen if the testsuite is passed (even if no test in the suite increases 1 kilome-ter every 1 minute and 0:05 liters), because the test suite iscomplete. Hence, if the IUT passes this laboratory test (devel-oped for ICMs where magnitudes are independent), then itwill also work in deployment conditions—as long as magni-tude sensors also work well and the IUT logical definitiondoes not “change” in deployment conditions (e.g., due tochip overheating caused by the engine). That is, the three-independent-magnitudes testing methodology completelychecks the logic of the IUT—also for the particular logic sce-narios that may happen under deployment conditions.

Alternatively, we could study how to explicitly imposeconstraints between increments of different magnitudes inthe testing reductions given before, so that the completeness

RODR�IGUEZ ET AL.: A GENERAL TESTABILITY THEORY: CLASSES, PROPERTIES, COMPLEXITY, AND TESTING REDUCTIONS 887

Page 27: A General Testability Theory:

is required only for those magnitude increments that can beproduced after the real deployment. Note that, in this case,the new derived complete test suites would not be requiredto guarantee the correctness when we add 100 kilometers, 0minutes, and 0 gas liters consumption, because this casewould be considered as impossible under deployment con-ditions. Studying these alternative reductions is beyond thescope of this case study.

Let us note that all the reductions given in this case studycould be easily adapted to the case where magnitudes areallowed to both increase and decrease, and triggering transi-tions can be launched by both increments and decrements.For instance, in the TV F TF reduction given before, wewould just have to consider both increments towards incre-ment triggering transitions and decrements towards decrementtriggering transitions. This way, adding e.g., a temperatureincrease/decrease factor to our car device example wouldbe simple.

Finally, let us note that the TW 0 F TF and TV F TF

reductions given in this example are particularly interestingbecause they are asymmetric: In both cases, the computationformalism of any pair in the origin testing problem is infinite(in particular, there are infinitely many deterministic ICMswith some given number of states and fulfilling the specificconstraints considered in these cases, because both casesenable infinite possible values to label triggering transi-tions), but the computation formalism of all pairs of the des-tination problem in both cases, i.e., TF , is finite (the set of allbehaviors which can be represented by an FSM with a givennumber of states is finite). In the rest of reductions studiedin this case study, we have finiteness at both sides or infi-niteness at both sides. Anyway, in all cases considered inthe case study, the testing problems belong to Class I.

6 PROOFS OF COMPLEXITY RESULTS

Proof of Theorem 1. First, let us prove CS 2 NP. Let us notethat the representation size of C and E is inOðjCj � jIj � jOjÞ and the representation size of 6$ is in

OðjOj2Þ. Given C, E, I, O, 6$ , K as input of CS, we provethat checking whether some given set of inputs I � I issuch that jI j K and I is a complete test suite for C andE requires polynomial time. Since I � I, we calculate jIjin time OðjIjÞ. Next we have to check that, for all pairðf; f 0Þ with f 2 E and f 0 2 CnE, there exists some i 2 Isuch that for all o 2 fðiÞ and o0 2 f 0ðiÞ we have o 6$ o0.This can be done in time OðjCj2 � jIj � jOj2Þ. Thus, CS 2 NP.

In order to prove CS 2 NP�complete, we polynomiallyreduce an NP-complete problem to CS. We will considerthe Set Cover problem. This NP-complete is defined asfollows: Given a set S, a collection J of subsets of S, anda natural number K 2 N, find a set cover J 0 for S (i.e., finda subset J 0 � J such that every element in S belongs to atleast one member of J 0) such that jJ 0j K.17

Given a set S ¼ fx1; . . . ; xng and a collection of setsJ ¼ fS1; . . . ; Skg with Sj � S for all 1 j k, we willconstruct some C, E, I, O from them in polynomial time.The idea of the transformation from Set Cover to CS is thefollowing: We construct an instance of CS where thereare some correct functions and a single incorrect function;in particular, we will have C ¼ ff1; . . . ; fn; f 0g and E ¼ff1; . . . ; fng (that is, f 0 is the only incorrect function). Forall 1 j n, the pair ðfj; f 0Þ with fj 2 E and f 0 2 CnEwill be used to represent the element xj of the set S.Besides, each input of CS will represent a set of J . In par-ticular, fj and f 0 will be defined in such a way thatðfj; f 0Þ are distinguished by input il if and only if xj 2 Sl.Since each input represents a set and each pair of cor-rect/incorrect functions represents an element, the taskof finding K inputs such that all pairs of correct/incor-rect functions are distinguished will be equivalent tofinding K sets such that all elements of S are covered bythem. Let us note that the set of outputs returned by afunction for an input must include at least one element.In our case, each function will return at least a (unique)output for each input: For each function fj and input il,

we will consider ajl 2 fjðilÞ for some unique output aj

l .

We will use the trivial distinguishing relation, i.e., o1 6$ o2

iff o1 6¼ o2. Thus, if only the unique outputs ajl were

returned, all pairs ðfj; f 0Þ would be distinguished by allinputs. Following the proposed analogy between bothproblems, ðfj; f 0Þ should be distinguished by il only ifxj 2 Sl. Thus, if xj 62 Sl then we must impose a conditionto avoid that ðfj; f 0Þ is distinguished by il. In particular,we will consider bl 2 fjðilÞ and bl 2 f 0ðilÞ for some out-put bl. Since fjðilÞ \ f 0ðilÞ 6¼ ? , the pair ðfj; f 0Þ is not dis-tinguished by il. Hence, distinguishing ðfj; f 0Þ (i.e.,,covering the element xj) will require to include anotherinput in the test suite (i.e.,, another set in the collection).

We define C;E; I; O; 6$ from J; S as follows:

� C ¼ ffj ¼ ðoutsj1; . . . ; outsjkÞj1 j ng S ff 0 ¼ðouts01; . . . ; outs0kÞgwhere:

– for all 1 j n and 1 l k we have

outsjl ¼ fajlg if xj 2 Sl, else outs

jl ¼ faj

l ;blg,– for all 1 l kwe have outs0l ¼ fa0

l;blg.� E ¼ fðoutsj1; . . . ; outsjkÞ j ðoutsj1; . . . ; outsjkÞ 2 Cg� I ¼ fi1; . . . ; ikg� O ¼ faj

l j1 j n ^1 l kg [ fa0

l j 1 l kg [ fblj1 l kg� 6$ ¼ fðo1; o2Þ j o1; o2 2 O ^ o1 6¼ o2gIt is easy to check that the representation size of C and

E is in OðjSj � jJ jÞ (jSj þ 1 functions, jJ j inputs for eachfunction, and one or two outputs for each input). Thereare OðjSjÞ inputs in I and OðjSj � jJ jÞ outputs in O, so the

representation size of 6$ is in OðjSj2 � jJ j2Þ. Thus, the con-structed instance of CS has polynomial size.

We check that there exists J 0 �J withS

S02J 0S0 ¼ S

and jJ 0j ¼ K iff there exists a complete test suite I for C,E with jI j ¼ K:

¼): Let us suppose J 0 ¼ fSa1 ; . . . ; SaKg andSS02J 0S

0 ¼ S. We check that I ¼ fia1 ; . . . ; iaKg is a

17. We could consider another problem, such as the Minimum TestCollection problem. Despite its name, this problem is very different toCS and thus is not a good choice to construct a polynomial reductionfrom an NP-complete problem to CS. In this problem, a collection ofsets is selected in such a way that, for all pair of elements, there is a setincluding only one of them.

888 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 9, SEPTEMBER 2014

Page 28: A General Testability Theory:

complete test suite for C, E. Let us suppose that it is not.Then, there exists a pair ðfj; f 0Þ with fj 2 E and f 0 2 CnEsuch that for all il 2 fia1 ; . . . ; iaKg we have

fjðilÞ \ f 0ðilÞ 6¼ ? . By the definition of C and E, if fjðilÞand f 0ðilÞ share some element then this element must bebl. So, for all il 2 fia1 ; . . . ; iaKg we have fjðilÞ \ f 0ðilÞ¼ fblg. By the construction of C and E from J and S, ifbl 2 fjðilÞ then xj 62 Sl. Thus, xj 62 Sl for allSl 2 fSa1 ; . . . ; SaKg ¼ J 0. Therefore,

SS02J 0S

0 6¼ S and

we get a contradiction.(¼: We assume that I ¼ fia1 ; . . . ; iaKg is a complete testsuite for C, E, and we check that J 0 ¼ fSa1 ; . . . ; SaKg ful-

fills the propertyS

S02J 0S0 ¼ S. Let us suppose it does

not. Then, there exists xj 2 S such that for all Sl 2 J 0 wehave xj 62 Sl. By the construction of C and E from J andS, this implies that for all il 2 fia1 ; . . . ; iaKg we have

bl 2 fjðilÞ and bl 2 f 0ðilÞ. Thus, no input of I allows todistinguish the pair ðfj; f 0Þ and I is not a complete testsuite for C, E. Hence, we have a contradiction. tu

Proof of Theorem 2:(a): We prove FR 2 P. From an instance of FR, let us con-struct a graph G ¼ ðV; EÞ as follows: V ¼ C, andðf1; f2Þ 2 E for all f1 2 E and f2 2 CnE such thatdi f1; f2; Ið Þ does not hold. By Lemma 3, a set R � C issuch that I is a complete test suite for ðCnR;EnRÞ ifff1 2 R or f2 2 R for all f1 2 E and f2 2 CnE such thatdiðf1; f2; IÞ does not hold. Since vertexes in V are func-tions in C and edges in E are pairs of correct/incorrectfunctions that are not distinguished by I , the problem offinding a set R � C with jRj K such that I is a com-plete test suite for ðCnR;EnRÞ is equivalent to removingK or less vertexes from G in such a way that all edges arelost. That is, FR is equivalent to finding a subset of ver-

texes V0 � V such that jV0j K and the subgraph G0 of Ginduced by VnV0 has no edges, that is, fðv; v0Þjðv; v0Þ 2E; fv; v0g � VnV0g ¼ ? . Let us consider the Vertex Coverproblem, which is stated as follows: Given a graph, finda set ofK or less vertexes such that, for each edge, at leastone endpoint of the edge is included in the set. Sincesolving FR requires that, for each edge, at least one of theendpoints of the edge is removed, we can solve FR interms of the Vertex Cover by considering that the vertexcover to be found represents the set of vertexes (func-tions) to be removed (i.e., R). In particular, we can do asfollows: From an FR problem instance, we construct thegraph G ¼ ðV; EÞ as defined before and next we find avertex cover Q of sizeK. Then, we have R ¼ Q.

Though the Vertex Cover problem is NP-complete ingeneral graphs, K€onig’s theorem [4] states that it is equiv-alent to theMatching problem if the graph is bipartite (andG is so). The Matching problem in bipartite graphs isdefined as follows: Given a bipartite graph, find a set ofKof more edges such that no vertex of the graph is at theendpoint ofmore than one edge in the set. By K€onig’s The-orem, the number of edges in a maximum matchingequals the number of vertices in a minimum vertex cover.Moreover, given a graph G ¼ ðV; EÞ, both problems are

solved in time Oð ffiffiffiffiffiffijVjp � jEjÞ by the Hopcroft-Karp algo-rithm [20]. Hence, FR and the corresponding optimization

problem of finding the minimum function removal can besolved by constructing G ¼ ðV; EÞ and next running thatalgorithm for G. Identifying all pairs ðf1; f2Þ such thatdi f1; f2; Ið Þ does not hold requires that, for all f 2 E andf 0 2 CnE, we check whether for all input i 2 I there existo 2 fðiÞ and o0 2 f 0ðiÞ such that o $ o0. Thus, the cost of

constructing G is in OðjCj2� jIj � jOj2Þ. The cost of execut-ing the Hopcroft-Karp algorithm is, in terms of the FR

problem instance, in Oð ffiffiffiffiffiffiffijCjp � jCj2Þ ¼ OðjCj5=2Þ. Thus, FR(and its corresponding optimization problem) can be

solved in time OðjCj5=2 þ jCj2 � jI j � jOj2Þ, and we con-clude FR 2 P.

(b): We prove FH 2 NP�complete. First we proveFH 2 NP. Given C, E, I, O, 6$ , I , H, K as input of FH, weprove that we can check in polynomial time whethersome given R � H is such that jS H2RHjK and I is a

complete test suite for ðCnð S H2RHÞ; EnðS H2RHÞÞ.Since R � H and for all H 2 R we have H � C, we cancalculate jS H2RHj in time OðjHj � jCjÞ. SubtractingS

H2RH from C and E requires that, for all H 2 R, we

remove all function f 2 H from these sets. Since for allH 2 R we have H � C, these subtractions can be done intime OðjHj � jCjÞ. After making these subtractions, wehave to check that, for all pair ðf; f 0Þ withf 2 EnðS H2RHÞ and f 0 2 ðCnðS H2RHÞÞnðEnð S H2RHÞÞ, there exists some i 2 I such that for all o 2 fðiÞ ando0 2 f 0ðiÞ we have o 6$ o0. Since jEnð S H2RHÞj jCj andjðCnð S H2RHÞÞnðEnð S H2RHÞÞj jCj, this can be done

in time OðjCj2 � jIj � jOj2Þ. Thus, checking whether somegiven R 2 H fulfills the required conditions can be donein polynomial time. Hence, FH 2 NP.

In order to prove FH 2 NP� complete, we polyno-mially reduce an NP-complete problem, 3�SAT, to FH.Next we introduce some notation related to 3�SAT.This problem is stated as follows: Given a proposi-tional logic formula ’ expressed in conjunctive normalform where each disjunctive clause has at most threeliterals, is there any valuation n satisfying ’ Let

’ ðl11 _ l12 _ l13Þ ^ � � � ^ ðln1 _ ln2 _ ln3 Þ be an inputfor 3�SAT. We denote by propsð’Þ ¼ fp1; . . . ; pmg theset of propositional symbols appearing in ’. Wedenote the ith disjunctive clause of ’ by ci, that is,ci li1 _ li2 _ li3. A valuation n is a functionn : propsð’Þ �! f>;?g. We say that n satisfies a clauseci if it makes true at least one literal of ci. We say thatn satisfies ’ if it makes true all clauses ci of ’.

Let ’ ðl11 _ l12 _ l13Þ ^ � � � ^ ðln1 _ ln2 _ ln3 Þ. Weconstruct in polynomial time an instanceðC;E; I; O; 6$ ; I ;H; KÞ of the FH problem such that there isa set of hypothesesR � Hwith jS H2RHj K such that Iis a complete test suite for ðCnð S H2RHÞ; Enð S H2RHÞÞ ifand only if ’ is satisfiable. Before formally presenting thisconstruction, we give an intuitive explanation. We will

construct an FH instance where, for each literal ljk of ’,

there will be one function in E and another function inCnE. The test suite I will consist of a single input i, andthe answers of each function f to input iwill be defined insuch a way that some specific pairs of functions will notbe distinguishable by i. Each function will answer some

RODR�IGUEZ ET AL.: A GENERAL TESTABILITY THEORY: CLASSES, PROPERTIES, COMPLEXITY, AND TESTING REDUCTIONS 889

Page 29: A General Testability Theory:

output a that is given only by that function. Thus, if thiswere the only output produced by a function then thisfunction would be distinguishable from any other func-tion (as we will consider the trivial distinguishing rela-tion: o1 6$ o2 iff o1 6¼ o2). However, if we want to make twofunctions f1; f2 not to be distinguished by i, then we willadd some output b to the answer of both functions to i. Inthis way, di f1; f2; Ið Þ will not hold. Thus, we can makeany pair of functions (un-)distinguishable as required forour construction.

In FH, the set of functions to be removed must be theunion of some hypotheses (i.e., sets of functions) from

some setH of available hypotheses. For each literal ljk, we

will have in H a hypothesis consisting of the function

representing ljk in E and the function representing the

same literal in CnE, and there will not be any otherhypotheses in H. This guarantees that the removal offunctions in E and CnE will be symmetric: The only wayto remove a function representing some literal in E orCnE consists in removing the function representing thesame literal in CnE and E, respectively.

There are two kinds of pairs of functions such thattheir behavior will be defined to make them undistin-guishable. On the one hand, pairs of functions represent-ing opposite literals (e.g., p and :p) in E and CnE willnot be distinguishable. Given a solution to FH, we willconsider that non-removed functions implicitly denote avaluation of the propositional symbols. For instance, if afunction representing :p is not removed then we areimplicitly considering nðpÞ ¼ ?; if neither p nor :p areremoved then nðpÞ can be arbitrarily set to > or ?. Thus,for each function in E and each function in CnE repre-senting opposite literals, at least one of them must beremoved. By the symmetry of E and CnE, both E andCnE will represent the same valuation indeed. On the

other hand, for each function representing a literal lkj in E

(respectively, in CnE), this function will not be distin-guishable from the functions representing the other twoliterals of the same clause in CnE (resp. E). Due to thedefinition of the hypotheses set H, this guarantees that,for each clause, at least four out of the six functions repre-senting the literals of each clause (three in E and three inCnE) will be removed (and, if two functions remain,then they will be the representation of the same literal inE and CnE). Thus, for each FH solution, at least 4 � n func-tions must be removed (recall that ’ has n clauses). Actu-ally, if only 4 � n functions are removed, then it meansthat one literal of each clause in E (resp. CnE) is madetrue. Moreover, literals made true in CnE (resp. E) donot contradict these literals, because pairs of contradict-ing functions are undistinguishable and thus at least onefunction of each pair is removed. By the symmetry, thismeans that true literals in E (resp. CnE) do not contradicttrue literals in the same set. Thus, if at most K ¼ 4 � nfunctions are removed in FH then ’must be satisfiable.

Next we formally construct an FH instance ðC;E; I;O; 6$ ; I ;H; KÞ from the 3� SAT instance ’ ðl11 _ l12 _l13Þ ^ � � � ^ ðln1 _ ln2 _ ln3 Þ, where we consider

propsð’Þ ¼ fp1; . . . ; pmg. Each literal ljk will be represented

by a function ðoutsjkÞ 2 E and ðouts0jk Þ 2 CnE including

the (unique) output symbol ajk and a

0jk , respectively. Two

functions denoting opposite literals in E and CnE will bemade undistinguishable by including the same output bq

y,

where q identifies the propositional symbol andy 2 f>;?g. A function from E (resp. CnE) will be madeundistinguishable from the functions denoting the otherliterals from the same clause inCnE (resp.E) by including

bjk (resp. b

0jk ).

� C ¼ fðoutsjkÞj1 j n; 1 k 3g Sfðouts0jk Þj1 j n; 1 k 3gwhere:

– for all 1 j n, 1 k 3 we have outsjk ¼ fajk;

bqy;b

jk;b

0jk1;b0j

k2g where ljk pq or ljk :pq (in the

first case, y ¼ > else y ¼ ?) and fk1; k2g ¼f1; 2; 3gnfkg.

– for all 1 j n, 1 k 3 we have outs0jk ¼ fa0jk ;

bqy;b

0jk ;b

jk1;bj

k2g where ljk pq or ljk :pq (in the

first case, y ¼ ? else y ¼ >) and fk1; k2g ¼f1; 2; 3gnfkg.

� E ¼ fðoutsjkÞ j ðoutsjkÞ 2 Cg� I ¼ fig� O¼ faj

kj1 j n; 1 k 3g [ fa0jk j1 j n; 1

k 3g [ fbjk j 1 j n; 1 k 3g [ fb0j

k j1 j n; 1 k 3g [ fbq

yj1 q m; y 2 f>;?gg� 6$ ¼ fðo1; o2Þjo1; o2 2 O ^ o1 6¼ o2g� I ¼ fig� H ¼ ffðoutsjkÞ; ðouts0jk Þg j 1 j n; 1 k 3g� K ¼ 4 � nWe show that the construction of

ðC;E; I; O; 6$ ; I ;H; KÞ from ’ requires polynomial timewith respect to the size of ’. There are 6 � n functions inC, and for each of them the answer to a single input (i) isdefined. This answer consists of five outputs. Besides,there are 12 � nþ 2 �m outputs in O. Even if 6$ is exten-

sionally defined, we have j 6$ j ¼ jOj2, and we havejHj ¼ jCj=2. In conclusion, ðC;E; I; O; 6$ ; I ;H; KÞ can beconstructed from ’ in polynomial time.

Let us show that the answer of FH to ðC;E; I; O;6$ ; I ;H; KÞ is yes iff ’ is satisfiable.

¼): Let us suppose that there is a set R � H withj S H2HHj K such that I is a complete test suite for�Cnð S H2RHÞ; Enð S H2RHÞ�. Then, by Lemma 3 we

have that f 2 SH2RH or f 0 2 S

H2RH for all f 2 C and

f 0 2 CnE such that di f ; f 0; Ið Þ does not hold. Since func-tions representing opposite literals are made undistin-guishable in the construction of the FH instance, for eachpair of functions representing opposite literals, at leastone of these functions must be included in

SH2RH. By

the definition of H, all H 2 H is such that H ¼ ff; f 0g forsome f 2 E and f 0 2 CnE representing the same literal of’. Thus, for all f 2 E such that f 2 S

H2RH, the function

f 0 representing the same literal as f in CnE is such thatf 0 2 S

H2RH, and vice versa. Since there is no pair of

functions representing opposite literals formed by a

890 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 9, SEPTEMBER 2014

Page 30: A General Testability Theory:

function in E and a function in CnE in the setCnð S H2RHÞ, by the symmetry between E and CnEthere is no pair of opposite functions formed by twofunctions from the same set either E or CnE. Thus, eitherliterals represented by functions in Enð S H2RÞ or literalsrepresented in ðCnEÞnð S H2RHÞ can be used to form a

(consistent) valuation of propositional symbols as fol-lows: We define a valuation n where nðpÞ ¼ > iff for

some 1 j n and 1 k 3 we have ljk p and

ðoutsjkÞ 62 ð S H2RHÞ. Let us note that if n makes true at

least one literal from each clause then n satisfies ’. By theconstruction of C and E, the function f 2 E representinga literal l is undistinguishable from the functions denot-ing the other two literals from the same clause in CnEand vice versa. Besides,

SH2RH cannot have a function

f 2 E without having the function f 0 2 CnE representingthe same literal in CnE, and vice versa. Thus at most twofunctions, out of the six functions denoting the literals ofa clause in E and CnE, can be in Cnð S H2RHÞ. Conse-quently, the condition j S H2HHj K ¼ 4 � n can be

achieved only if, out of the six functions denoting eachclause, exactly four of them are in

SH2RH. Since we

have j S H2HHj K indeed, the valuation n, which satis-

fies all literals not inS

H2RH, satisfies at least one of the

literals from each clause. Therefore, n satisfies ’.(¼: Let us suppose that ’ is satisfiable. We show thatthere exists R � H such that j S H2HHj K and I is a

complete test suite for the pairðCnð S H2RHÞ; Enð S H2RHÞÞ. Let n be a valuation of

propsð’Þ satisfying ’. Let Q � C be such that, for all lit-

eral cj lj1 _ lj2 _ lj3 of ’, we have

� ðoutsjkÞ; ðouts0jk Þ 2 Q for some 1 k 3 such that

either (a) for some q we have ljk pq and

nðpqÞ ¼ >; or (b) for some q we have ljk :pq andnðpqÞ ¼ ? (since n satisfies cj lj1 _ lj2 _ lj3,there exists such k)

� ðoutsjk1Þ; ðouts0jk1Þ; ðoutsjk2Þ; ðouts

0jk2Þ 62 Q, where

fk1; k2g ¼ f1; 2; 3gnfkg.

Let us consider R ¼ fHjH 2 H; H \Q ¼ ?g. Let us recallthat, by the construction of H, we have H ¼ ffðoutsjkÞ;ðouts0jk Þg j 1jn; 1k3g. Therefore, we have Q ¼ Cnð S H2RHÞ and j S H2RHj ¼ 4 � n. By Lemma 3, if for all

f1 2 E and f2 2 CnE such that di f1; f2; Ið Þ does not hold we

have either f1 2S

H2RH or f2 2S

H2RH, then I is a com-

plete test suite for ðCnð S H2RHÞ; Enð S H2RHÞÞ. By the con-

struction of E and CnE, a pair of functions is

undistinguishable if either (a) they represent opposite literals

or (b) they represent different literals from the same clause. By

the construction of Q from n, Q does not have a pair of func-

tions representing opposite literals, and it does not have a pair

of functions representing different literals from the same

clause. Thus, there do not exist f1 2 E \Q and f2 2 ðCnEÞ\Q such that di f1; f2 ; Ið Þ does not hold. Since we have E \Q ¼Enð S H2RHÞ and C \Q ¼ C lashð S H2RHÞ, we conclude

that the set I is a complete test suite for ðCnð S H2RHÞ; Enð S H2RHÞÞ.

(c): We prove HA 2 NP� complete. Given C, E, I, O, 6$ ,I , H, K as input of HA, checking whether some givenR � H is such that jRj K and I is a complete test suitefor ðCnð S H2RHÞ; Enð S H2RHÞÞ requires polynomial

time (we can prove it similarly as in the beginning of theproof of Theorem 2 (b)). Thus, HA 2 NP. In order to proveHA 2 NP� complete, we polynomially reduce an NP-complete problem, the Set Cover problem, to HA. The SetCover problem, already considered in the proof of Theo-rem 1, is defined as follows: Given a set S, a collection Jof subsets of S, and a natural number K 2 N, find a setcover J 0 for S (i.e.,, find a subset J 0 � J such that everyelement in S belongs to at least one member of J 0) suchthat jJ 0j K.

Given a problem instance of Set Cover, that is, a setS ¼ fx1; . . . ; xng, a collection of sets J ¼ fS1; . . . ; SkgwithSj � S for all 1 j k, and K 2 N, we will construct inpolynomial time an HA instance from them, that is, weconstruct some C, E, I,O, 6$ , I ,H,K0 from S, J ,K, whereC and E represent some computation formalism C andsome specification E, respectively. This instance is suchthat the solution of HA for ðC;E; I; O; 6$ ; I ;H; K0Þ is yes iffthe solution of Set Cover for ðS; J;KÞ is yes as well. Theidea of the transformation from a Set Cover instance to aHA instance is the following. We consider a test suite Iwith a single input i. Each element x 2 S is representedby a function in E and a function in CnE, and there areno more functions in C. By defining responses of func-tions to input i in a similar way as in the proof of Theo-rem 2 (b), we make undistinguishable some specific pairsof functions. In particular, functions representing thesame element x 2 S in E and CnE are made undistin-guishable, and there are no more undistinguishable pairsof functions. In order to do this, each function producessome unique output a in response to i, and functions rep-resenting the same element x 2 S in E and CnE answersome shared output b too. This will make them undistin-guishable, because we will consider o1 6$ o2 iff o1 6¼ o2.For each set fx1; . . . ; xlg 2 J , we have a hypothesisH 2 H consisting of the functions representing the ele-ments x1; . . . ; xl in E, and there are no more hypothesesin H. Solving HA requires that, for each pair of undistin-guishable functions, at least one of them is removed bythe set R of selected hypotheses. Only functions in E canbe removed by hypotheses. Since all functions areinvolved in a pair of undistinguishable functions, solvingHA consists in choosing some hypotheses such that allfunctions from E are removed. Since hypotheses repre-sent sets in J , finding a set of hypotheses R such that allfunctions in E are removed and jRj K is equivalent tofinding K or less sets of J such that they cover all ele-ments in S. Thus, the answer of HA is yes iff the answer ofSet Cover is yes.

Next we formally construct the HA instance ðC;E; I; O;6$ ; I ;H; K0Þ from ðS; J;KÞ. Let us consider S ¼fx1; . . . ; xng and J ¼ fS1; . . . ; Skg. Each element xj 2 S is

represented by a function ðoutsjÞ 2 E and a function

ðouts0jÞ 2 CnE, where outsj and outs0j denote the set ofoutputs answered by the corresponding functions when

input i is given. The unique outputs aj and a0j are

RODR�IGUEZ ET AL.: A GENERAL TESTABILITY THEORY: CLASSES, PROPERTIES, COMPLEXITY, AND TESTING REDUCTIONS 891

Page 31: A General Testability Theory:

included in outsj and outs0j, respectively. Besides, theoutput bj is included in both outsj and outs0j.

� C ¼ fðoutsjÞ j 1 j ng S fðouts0jÞ j 1 j ngwhere:

– for all 1 j nwe have outsj ¼ faj;bjg– for all 1 j nwe have outs0j ¼ fa0j;bjg� E ¼ fðoutsjÞjðoutsjÞ 2 Cg� I ¼ fig� O ¼ fajj1 j ng [ fa0jj1 j ng [

fbjj1 j ng� 6$ ¼ fðo1; o2Þjo1; o2 2 O ^ o1 6¼ o2g� I ¼ fig� H ¼ ffðoutsj1Þ; . . . ; ðoutsjlÞg j fxj1 ; . . . ; xjlg2 Jg� K0 ¼ KWe consider the cost of constructing ðC;E; I; O; 6$ ;

I ;H; K0Þ from ðS; J;KÞ. We have jCj ¼ 2 � jSj. For eachfunction in C, the answer to a single input i is defined,and this answer consists of two outputs. Even if 6$ is

extensionally defined, we have j 6$ j ¼ jOj2, and we havejOj ¼ 3 � jSj. Besides, jHj ¼ jJ j. Thus,ðC;E; I; O; 6$ ; I ;H; K0Þ can be constructed from ðS; J;KÞin polynomial time.

Let us show that the answer of HA toðC;E; I; O; 6$ ; I ;H; K0Þ is yes iff the answer of the Mini-mum Set Cover to ðS; J;KÞ is yes.

¼): Let us suppose that there exists R � Hwith jRj K0

such that I is a complete test suite for ðCnð S H2RHÞ;Enð S H2RHÞÞ. By Lemma 3, we have that f1 2

SH2RH

or f2 2S

H2RH for all f1 2 E and f2 2 CnE such that

di f1; f2ð Þ does not hold. By the construction of H, H � Efor all H 2 H. Besides, by the construction of C and E,for all function f1 2 E there exists f2 2 CnE such thatdi f1; f2; Ið Þ does not hold.We conclude

SH2RH ¼ E.

Each function f 2 E represents an element x 2 S and, for

all H 2 H, the set H ¼ fðoutsh1Þ; . . . ; ðoutshjÞg representsa set fxh1 ; . . . ; xhjg 2 J . Since

SH2RH ¼ E, R is a set

cover of S. By construction, K0 ¼ K. Thus, we concludejRj K.(¼: Let J 0 ¼ fS1; . . . ; Slg � J be a set cover of S withjJ 0j K. By the construction of H, for all H ¼fðoutsj1Þ; . . . ; ðoutsjlÞg 2 H there exists S0 2 J withS0 ¼ fxj1 ; . . . ; xjlg. Let R � H be such that, for all S0 2 J 0,we have H 2 R for some H ¼ fðoutsg1Þ; . . . ; ðoutsgkÞgwith S0 ¼ fxg1 ; . . . ; xgkg, and there are no more hypothe-

ses in R. Since J 0 is a set cover of S and all functions in Erepresent an element of S, we have

SH2RH ¼ E. By the

definition of R from J 0 we have jRj ¼ jJ 0j, sojRj K0 ¼ K. Besides, Enð S H2RHÞ ¼ ? sinceS

H2RH ¼ E. Thus, I is a complete test suite for

ðCnð S H2RHÞ; Enð S H2RHÞÞ.tu

7 CONCLUSIONS AND FUTURE WORK

Formal Testing Techniques have reached a high level ofmaturity during the last few years. However, some commonroots allowing us to relate testing methods with each other

are still missing. In particular, a classification of testingproblems would allow us to use our expertise about oldtesting problems to solve new problems. This paper tries tocontribute to this (long term) goal. We have presented somegeneral notions of testability and we have identified fiveclasses of testability. We have studied some properties ofthe first class, i.e., finitely testable problems, including thecomplexity of finding a minimum complete test suite ormeasuring the completeness degree of incomplete testsuites, alternative characterizations, transformations keep-ing the testability, the effect of adding testing hypotheses,and methods of reducing one testing problem into another.From a theoretical point of view, these techniques allow usto relate testing problems to each other as well as to classifythem. From a practical point of view, they allow us to deter-mine the (im-)possibility of finding complete test suites indifferent scenarios. They also allow us to reason about howfar an incomplete test suite is from being complete, thusproviding an implicit way to compare and select incompletetest suites. The proposed framework endows us with thesecapabilities even though it is highly abstract. In particular,many examples have been given to illustrate these capabili-ties, and new knowledge about some known testing frame-works and other useful settings have been discovered byapplying our notions in the case studies section. Going onestep further, the proposed general properties could be par-ticularized and refined for specific computation formalisms.

More generally, properties presented in Section 4 allowus to envisage new paths to improve our understandingabout testing in the future. This study should not berestricted to Class I; on the contrary, the properties of clas-ses 2, 3, 4, and 5 must be studied in detail as well. In particu-lar, the testing reduction notion could play a key role inorganizing the five proposed classes into a big variety ofsubclasses that should be defined, related, and studied aswell: e.g., bounded finite testability (what is the size of mini-mum complete test suites with respect to the size of the sys-tem representation? For example, polynomial orexponential?), adaptive testability [23], probabilistic testability[32], heuristic testability, that is, using heuristics to find non-optimal (but good enough) test suites, etc.

Regarding adaptive testability, we will study the problemof testing systems in the case where the test suite is not set apriori, but the decision of which test is applied next candepend on results observed in response to previouslyapplied tests. Let us consider the following example: Wewish to check, by means of testing, whether some (terminat-ing) programming function f from natural numbers to natu-ral numbers fulfills the specification condition “fðfð3ÞÞ¼ 4”. Since we do not know fð3Þ a priori, any test suite con-structed a priori must check all natural numbers, so thisproblem is not in Class I if only preset test suites are consid-ered. However, if tests can depend on previous responsesthen we can apply input 3 and next input fð3Þ. Thus, the setf3; fð3Þg is a finite complete test suite in the adaptive testingcase, and so this problem would belong to a kind ofAdaptiveClassI. The differences between the classes pre-sented in this paper if they are considered as preset, and theircorresponding adaptive counterparts, should be studied.Also, we wish to study the applicability of the proposed tes-tability notions to the related problem of learnability [1], [16].

892 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 9, SEPTEMBER 2014

Page 32: A General Testability Theory:

Finally, we will develop a subdivision of Class 2 (theclass of testing cases which are unboundedly-approachableby finite testing) in terms of the speed of the convergencetowards distinguishing rate 1 (i.e., completeness) in thelimit. In particular, given a pair ðC;EÞ (and perhaps also aspecific sequence C1 � C2 � � � � converging towards C), howdoes the size of finite test suites grows with respect to therequired distinguishing rate " < 1? Would the size of

required test suites be in e.g., Oðn2Þ with respect to 11�"? Or

would it be in e.g., Oðn � logðnÞÞ? These kind of asymptoticmeasures would allow us to know how fast we can con-verge, in each case, towards completeness in the limit. Moreimportantly, this would provide us with a measure of mar-ginal utility of tests: After applying n tests, what is the utilityof applying one more test, in terms of gained distinguishingrate? When should we stop applying more tests, becausethe marginal utility of a new test (i.e., the gained distin-guishing rate) would be too small? This would allow testersto calculate the appropriate amount of time they shouldassign to testing.

ACKNOWLEDGMENTS

The authors thank Fernando Rubio, Alberto de la Encina,Carlos Molinero, and Manuel N�u~nez for their suggestions,Diego Rodr�ıguez for his support, and the anonymousreviewers of this paper as well as those of CONCUR’09 fortheir interesting comments. This work is a revised andextended version of CONCUR’09 paper [28]. It was sup-ported by projects TIN2009-14312-C02-01, TIN2012-39391-C04-04, and TIN2012-36812-C02-01.

REFERENCES

[1] D. Angluin, “Computational learning theory: Survey and selectedbibliography,” in Proc. 24th Annu. ACM Symp. Theory Comput.,1992, pp. 351–369.

[2] G. Bernot, M.-C. Gaudel, and B. Marre, “Software testing based onformal specification: A theory and a tool,” Softw. Eng. J., vol. 6,pp. 387–405, 1991.

[3] I. Berrada, R. Castanet, P. F�elix, and A. Salah, “Test case mini-mization for real-time systems using timed bound traces,” inProc. 18th IFIP TC6/WG6.1 Int. Conf. Testing Commun. Syst.,2006, pp. 289–305.

[4] J. A. Bondy and U. S. R. Murty, Graph Theory with Applications.Amsterdam, The Netherlands: North Holland, 1976.

[5] E. Brinksma and J. Tretmans, “Testing transition systems: Anannotated bibliography,” in Proc. 4th Summer School Model. Verifi-cation Parallel Processes, 2001, pp. 187–195.

[6] Y. Cheon and G. Leavens, “A simple and practical approach tounit testing: The JML and JUnit way,” in Proc. 16th Eur. Conf.Object-Oriented Program., 2002, pp. 231–255.

[7] J. Cherniavsky and C. Smith, “A recursion theoretic approach toprogram testing,” IEEE Trans. Softw. Eng., vol. 13, no. 7, pp. 777–784, Jul. 1987.

[8] J. Cherniavsky and R. Statman, “Testing: An abstractapproach,” in Proc. 2nd Workshop Softw. Testing Verification,Anal., 1998, pp. 38–44.

[9] T. S. Chow, “Testing software design modeled by finite-statemachines,” IEEE Trans. Softw. Eng., vol. SE-4, no. 3, pp. 178–187,May 1978.

[10] M. Davis and E. Weyuker, “Metric space-based test-base ade-quacy criteria,” Comput. J., vol. 31, no. 1, pp. 17–24, 1988.

[11] R. DeMillo, R. Lipton, and F. Sayward, “Hints on test data selec-tion: Help for the practicing programmer,” IEEE Comput., vol. 11,no. 4, pp. 34–41, Apr. 1978.

[12] H. Do, G. Rothermel, and A. Kinneer, “Prioritizing JUnit testcases: An empirical assessment and cost-benefits analysis,” Empir-ical Softw. Eng., vol. 11, no. 1, pp. 33–70, 2006.

[13] R. Dorofeeva, K. El-Fakih, S. Maag, A. Cavalli, and N. Yevtush-enko, “FSM-based conformance testing methods: A survey anno-tated with experimental evaluation,” Inf. Softw. Technol., vol. 52,no. 12, pp. 1286–1297, 2010.

[14] C. Gaston, P. L. Gall, N. Rapin, and A. Touil, “Symbolic executiontechniques for test purpose definition,” in Proc. 18th IFIP TC6/WG6.1 Int. Conf. Testing Commun. Syst., 2006, pp. 1–18.

[15] M.-C. Gaudel, “Testing can be formal, too,” in Proc. 6th CAAP/FASE, Theory PracticeSoftw. Develop., 1995, pp. 82–96.

[16] E. M. Gold. (1967). Language identification in the limit. Inf. Con-trol, [Online]. 10(5), pp. 447–474 Available: http://www.isrl.uiuc.edu/ amag/langev/paper/gold67limit.html

[17] M. Hennessy, Algebraic Theory of Processes. Cambridge, MA, USA:MIT Press, 1988.

[18] R. M. Hierons, “Comparing test sets and criteria in the presence oftest hypotheses and fault domains,” ACM Trans. Softw. Eng. Meth-odol., vol. 11, no. 4, pp. 427–448, 2002.

[19] R. M. Hierons, “Verdict functions in testing with a fault domain ortest hypotheses,” ACM Trans. Softw. Eng. Methodol., vol. 18, no. 4,pp. 1–19, 2009.

[20] J. E. Hopcroft and R. M. Karp, “An n5=2 algorithm for maximummatchings in bipartite graphs,” SIAM J. Comput., vol. 2, no. 4,pp. 225–231, 1973.

[21] W. Howden, “Weak mutation testing and completeness of testsets,” IEEE Trans. Softw. Eng., vol. 8, pp. 371–379, Jul. 1982.

[22] M. Krichen and S. Tripakis, “Black-box conformance testing forreal-time systems,” in Proc. 11th Int. SPIN Workshop Model Check-ing Softw., 2004, pp. 109–126.

[23] D. Lee and M. Yannakakis, “Principles and methods of testingfinite state machines: A survey,” Proc. IEEE, vol. 84, no. 8,pp. 1090–1123, Aug. 1996.

[24] N. L�opez, M. N�u~nez, and I. Rodr�ıguez, “Specification, testing andimplementation relations for symbolic-probabilistic systems,”Theoretical Comput. Sci., vol. 353, no. 1–3, pp. 228–248, 2006.

[25] K. Meinke, “A stochastic theory of black-box software testing,” inAlgebra, Meaning, and Computation. New York, NY, USA: Springer,2006, pp. 578–595.

[26] M. Merayo, M. N�u~nez, and I. Rodr�ıguez, “Extending EFSMs tospecify and test timed systems with action durations and time-outs,” IEEE Trans. Comput., vol. 57, no. 6, pp. 835–844, Jun. 2008.

[27] A. Petrenko, “Fault model-driven test derivation from finite statemodels: Annotated bibliography,” in Proc. 4th Summer SchoolModel. Verification Parallel Processes, 2001, pp.196–205.

[28] I. Rodr�ıguez, “A general testability theory,” in Proc. 20th Int. Conf.Concurrency Theory, 2009, pp. 572–586.

[29] I. Rodr�ıguez, M. Merayo, and M. N�u~nez, “HOTL: Hypotheses andobservations testing logic,” J. Logic Algebraic Program., vol. 74,no. 2, pp. 57–93, 2008.

[30] K. Romanik and J. Vitter, “Using vapnik-chervonenkis dimensionto analyze the testing complexity of program segments,” Inf. Com-put., vol. 128, no. 2, pp. 87–108, 1996.

[31] J. Springintveld, F. Vaandrager, and P. D’Argenio, “Testing timedautomata,” Theoretical Comput. Sci., vol. 254, no. 1/2, pp. 225–257,2001.

[32] M. Stoelinga and F. Vaandrager, “A testing scenario for probabi-listic automata,” in Proc. 30th Int. Colloq. Automata, Language Pro-gram., 2003, pp. 464–477.

[33] J. Tretmans, “Conformance testing with labelled transition sys-tems: Implementation relations and test generation,” Comput.Netw. ISDN Syst., vol. 29, pp. 49–79, 1996.

[34] J. Tretmans, “Testing concurrent systems: A formal approach,” inProc. 10th Int. Conf. Concurrency Theory, 1999, pp. 46–65.

[35] H. Zhu and X. He, “A methodology of testing high-level petrinets,” Inf. Softw. Technol., vol. 44, no. 8, pp. 473–489, 2002.

Ismael Rodr�ıguez received the MS and PhDdegrees in computer science in 2001 and 2004,respectively. He is an associate professor in theComputer Systems and Computation Depart-ment, Complutense University of Madrid, Spain.He has published more than 80 papers in inter-national refereed conferences and journals. Hisresearch interests cover formal methods, testingtechniques, swarm and evolutionary optimizationmethods, and functional programming. He recei-ved the Best Thesis Award of his faculty in 2004

and the Best Paper Award in the IFIP WG 6.1 FORTE 2001conference.

RODR�IGUEZ ET AL.: A GENERAL TESTABILITY THEORY: CLASSES, PROPERTIES, COMPLEXITY, AND TESTING REDUCTIONS 893

Page 33: A General Testability Theory:

Luis Llana received the MS and PhD degrees inmathematics 1991 and 1996, respectively. He isan associate professor in the Computer Systemsand Computation Department, ComplutenseUniversity of Madrid, Spain. His main researchinterest fields include formal methods andtesting techniques. Currently, he is opening newresearch fields such as artificial vision ande-learning.

Pablo Rabanal received the MS degree in com-puter science in 2004, and the PhD degree incomputer science in 2010, based on the develop-ment of nature-inspired techniques to solve NP-complete problems. He is an assistant professorin the Computer Systems and ComputationDepartment, Complutense University of Madrid,Spain. He has published more than 20 papers ininternational refereed conferences and journals.His research interests cover swarm and evolu-tionary optimization methods, formal methods,

testing techniques, and web services.

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

894 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 40, NO. 9, SEPTEMBER 2014