STUnit-6
-
Upload
rahul-narang -
Category
Documents
-
view
221 -
download
0
Transcript of STUnit-6
-
8/6/2019 STUnit-6
1/22
6.5 Data Flow Analysis with Arrays and Pointers
The models and flow analyses described in the preceding section
have been limited to simple scalar variables in individual
procedures. Arrays and pointers (including object references and
procedure arguments) introduce additional issues, because it is not
possible in general to determine whether two accesses refer to the
same storage location . For example, consider the following code
fragment:
1 a[i] = 13;
2 k = a[j];
Are these two lines a definition-use pair? They are if the values of i
and j are equal, which might be true on some executions and not on
others. A static analysis cannot, in general, determine whether they
are always, sometimes, or never equal, so a source of imprecision
is necessarily introduced into data flow analysis.
Pointers and object references introduce the same issue, often in
less obvious ways. Consider the following snippet:
1 a[2] = 42;
2 i = b[2];
-
8/6/2019 STUnit-6
2/22
I t seems that there cannot possibly be a definition-use pair
involving these two lines, since they involve none of the same
variables. However, arrays in Java are dynamically allocated
objects accessed through pointers. Pointers of any kind introduce
the possibility of aliasing , that is, of two different names referring
to the same storage location. For example, the two lines above
might have been part of the following program fragment:
1 int []a= new int [3];
2 int []b=a;
3 a[2] = 42;
4 i = b[2];
Here a and b are aliases , two different names for the same
dynamically allocated array object, and an assignment to part of a
is also an assignment to part of b.
The same phenomenon, and worse, appears in languages with
lower-level pointer manipulation. Perhaps the most egregious
example is pointer arithmetic in C:
1 p=&b;
2 *(p+i)=k;
-
8/6/2019 STUnit-6
3/22
I t is impossible to know which variable is defined by the second
line. Even if we know the value of i, the result is dependent on how
a particular compiler arranges variables in memory.
Dynamic references and the potential for aliasing introduceuncertainty into data flow analysis. I n place of a definition or use
of a single variable, we may have a potential definition or use of a
whole set of variables or locations that could be aliases of each
other. The proper treatment of this uncertainty depends on the use
to which the analysis will be put. For example, if we seek strong
assurance that v is always initialized before it is used, we may not
wish to treat an assignment to a potential alias of v as initialization,
but we may wish to treat a use of a potential alias of v as a use of
v.
A useful mental trick for thinking about treatment of aliases is to
translate the uncertainty introduced by aliasing into uncertainty
introduced by control flow. After all, data flow analysis already
copes with uncertainty about which potential execution paths willactually be taken; an infeasible path in the control flow graph may
add elements to an any-paths analysis or remove results from an
all-paths analysis. I t is usually appropriate to treat uncertainty
-
8/6/2019 STUnit-6
4/22
about aliasing consistently with uncertainty about control flow. For
example, considering again the first example of an ambiguous
reference:
1 a[i] = 13;2 k = a[j];
We can imagine replacing this by the equivalent code:
1 a[i] = 13;
2 if (i == j) {
3 k = a[i];
4 } else {
5 k = a[j];
6 }
I n the (imaginary) transformed code, we could treat all array
references as distinct, because the possibility of aliasing is fully
expressed in control flow. Now, if we are using an any-path
analysis like reaching definitions, the potential aliasing will resultin creating a definition-use pair. On the other hand, an assignment
to a[j] would not kill a previous assignment to a[i]. This suggests
that, for an any-path analysis, gen sets should include everything
-
8/6/2019 STUnit-6
5/22
-
8/6/2019 STUnit-6
6/22
We cannot determine whether the two arguments fromCust and
toCust are references to the same object without looking at the
context in which this method is called. Moreover, we cannot
determine whether fromHome and fromWork are (or could be)
references to the same object without more information about how
Cust I nfo objects are treated elsewhere in the program.
Sometimes it is sufficient to treat all nonlocal information as
unknown. For example, we could treat the two Cust I nfo objects as
potential aliases of each other, and similarly treat the four
PhoneNum objects as potential aliases. Sometimes, though, large
sets of aliases will result in analysis results that are so imprecise as
to be useless. Therefore data flow analysis is often preceded by an
interprocedural analysis to calculate sets of aliases or the locations
that each pointer or reference can refer to.
6.6 Interprocedural AnalysisMost important program properties involve more than one
procedure, and as mentioned earlier, some interprocedural analysis
(e.g., to detect potential aliases) is often required as a prelude even
to intraprocedural analysis. One might expect the interprocedural
analysis and models to be a natural extension of the intraprocedural
-
8/6/2019 STUnit-6
7/22
analysis, following procedure calls and returns like intraprocedural
control flow. Unfortunately, this is seldom a practical option.
I f we were to extend data flow models by following control flow
paths through procedure calls and returns, using the control flowgraph model and the call graph model together in the obvious way,
we would observe many spurious paths. Figure 6.13 illustrates the
problem: Procedure foo and procedure bar each make a call on
procedure sub. When procedure call and return are treated as if
they were normal control flow, in addition to the execution
sequences ( A,X,Y,B ) and ( C,X,Y,D ), the combined graph contains
the impossible paths ( A,X,Y,D ) and ( C,X,Y,B ).
Figure 6.13: Spurious execution paths result when procedure calls
and returns are treated as normal edges in the control flow graph.
The path ( A,X,Y,D ) appears in the combined graph, but it does not
correspond to an actual execution order.
-
8/6/2019 STUnit-6
8/22
I t is possible to represent procedure calls and returns precisely, for
example by making a copy of the called procedure for each point at
which it is called. This would result in a co ntext-sensitive analysis.
The shortcoming of context sensitive analysis was already
mentioned in the previous chapter : The number of different
contexts in which a procedure must be considered could be
exponentially larger than the number of procedures. I n practice, a
context-sensitive analysis can be practical for a small group of
closely related procedures (e.g., a single Java class), but is almost
never a practical option for a whole program.
Some interprocedural properties are quite independent of context
and lend themselves naturally to analysis in a hierarchical,
piecemeal fashion. Such a hierarchical analysis can be both precise
and efficient. The analyses that are provided as part of normal
compilation are often of this sort. The unhandled exception
analysis of Java is a good example : Each procedure (method) is
required to declare the exceptions that it may throw without
handling. I f method M calls method N in the same or another class,
and if N can throw some exception, then M must either handle that
exception or declare that it, too, can throw the exception. This
analysis is simple and efficient because, when analyzing method
-
8/6/2019 STUnit-6
9/22
M, the internal structure of N is irrelevant; only the results of the
analysis at N (which, in Java, is also part of the signature of N) are
needed.
Two conditions are necessary to obtain an efficient, hierarchicalanalysis like the exception analysis routinely carried out by Java
compilers. First, the information needed to analyze a calling
procedure must be small: I t must not be proportional either to the
size of the called procedure, or to the number of procedures that
are directly or indirectly called. Second, it is essential that
information about the called procedure be independent of the
caller; that is, it must be context-independent. When these two
conditions are true, it is straightforward to develop an efficient
analysis that works upward from leaves of the call graph. (When
there are cycles in the call graph from recursive or mutually
recursive procedures, an iterative approach similar to data flow
analysis algorithms can usually be devised.)
Unfortunately, not all important properties are amenable tohierarchical analysis. Potential aliasing information, which is
essential to data flow analysis even within individual procedures, is
one of those that are not. We have seen that potential aliasing can
-
8/6/2019 STUnit-6
10/22
depend in part on the arguments passed to a procedure, so it does
not have the context-independence property required for an
efficient hierarchical analysis. For such an analysis, additional
sacrifices of precision must be made for the sake of efficiency.
Even when a property is context-dependent, an analysis for that
property may be context-insensitive, although the context-
insensitive analysis will necessarily be less precise as a
consequence of discarding context information. At the extreme, a
linear time analysis can be obtained by discarding both context and
control flow information.
Context- and flow-insensitive algorithms for pointer analysis
typically treat each statement of a program as a constraint. For
example, on encountering an assignment
1 x=y;
where y is a pointer, such an algorithm simply notes that x may
refer to any of the same objects that y may refer to. R eferen c es( x) R eferen c es( y) is a constraint that is completely independent of
the order in which statements are executed. A procedure call, in
such an analysis, is just an assignment of values to arguments.
-
8/6/2019 STUnit-6
11/22
Using efficient data structures for merging sets, some analyzers
can process hundreds of thousands of lines of source code in a few
seconds. The results are imprecise, but still much better than the
worst-case assumption that any two compatible pointers might
refer to the same object.
The best approach to interprocedural pointer analysis will often lie
somewhere between the astronomical expense of a precise,
context- and flow-sensitive pointer analysis and the imprecision of
the fastest context- and flow-insensitive analyses. Unfortunately,
there is not one best algorithm or tool for all uses. I n addition to
context and flow sensitivity, important design trade-offs include
the granularity of modeling references (e.g., whether individual
fields of an object are distinguished) and the granularity of
modeling the program heap (that is, which allocated objects are
distinguished from each other).
-
8/6/2019 STUnit-6
12/22
Chapter 13: Data Flow Testing12 /* External file hex values.h defines Hex Values[128] 3 * with value 0 to 15 for the legal hex digits (case-insensitive) 4 * and value -1 for each illegal digit including special characters 5 */ 67 # incl u de "hex val u es.h"8 /** Translate a string from the CGI encoding to plain ascii text. 9 * '+' becomes space, %xx becomes byte with hex value xx, 10 * other alphanumeric characters map to themselves. 11 * Returns 0 for success, positive for erroneous input 12 * 1 = bad hexadecimal digit 13 */ 14 int cgi decode(char *encoded, char *decoded) { 15 char *eptr = encoded;16 char *dptr = decoded;17 int ok=0;18 while (*e p tr) { 19 char c;20 c = *eptr;2122 if (c == '+') { /* Case 1: '+' maps to blank */ 23 *dptr = '';24 } else if (c == '%') { /* Case 2: '%xx' is hex for character xx */ 25 int digit high = Hex Values[*(++eptr)];26 int digit low = Hex Values[*(++eptr)];27 if ( digit high == -1 || digit low==-1) {2 8 /* *dptr='?'; */ 29 ok=1; /* Bad return code */ 30 } else {31 *dptr = 16* digit high + digit low;32 }33 } else { /* Case 3: All other characters map to themselves */ 34 *dptr = *eptr;35 }36 ++dptr;37 ++eptr;
3 8 } 39 *dptr = ' \ 0'; /* Null terminator for string */ 40 ret u rn ok;41 }
Figure 13.1: The C function cgi_decode , which translates a cgi-encoded string to a plain ASCII string (reversingthe encoding applied by the common gateway interface of most Web servers). This program is also used inChapter 12 and also presented in Figure 12.1 of Chapter 12 .
-
8/6/2019 STUnit-6
13/22
13 .2 Definition-Use Associations
-
8/6/2019 STUnit-6
14/22
13 .3 Data Flow Testing CriteriaV arious data flow testing criteria can be defined by requiring
coverage of DU pairs in various ways.
The All DU pairs adequa c y c riteri o n requires each DU pair
to be exercised in at least one program execution, following
the idea that an erroneous value produced by one statement
(the definition) might be revealed only by its use in another
statement.
A test suite T for a program P satisfies the all DU pairs
adequacy criterion iff, for each DU pair du of P , at least one
test case in T exercises du.
The corresponding coverage measure is the proportion of
covered DU pairs:
The all DU pairs coverage CDU pairs of T for P is the
fraction of DU pairs of program P exercised by at least one
test case in T .
-
8/6/2019 STUnit-6
15/22
The all DU pairs adequacy criterion assures a finer grain
coverage than statement and branch adequacy criteria.
If we consider, for example, function cgi decode, we can
easily see that statement and branch coverage can be
obtained by traversing the while loop no more than once, for
example, with the test suite T bran ch = {"+","%3D","%FG","t"}
while several DU pairs cannot be covered without executing
the while loop at least twice.
The pairs that may remain uncovered after statement and
branch coverage correspond to occurrences of different
characters within the source string, and not only at the
beginning of the string.
For example, the DU pair 37, 25 for variable *eptr can
be covered with a test case T C DU pairs "test%3D" where the
hexadecimal escape sequence occurs inside the input string, but not with "%3D." The test suite T DU pairs obtained by
adding the test case T C DU pairs to the test suite T bran ch satisfies
the all DU pairs adequacy criterion, since it adds both the
-
8/6/2019 STUnit-6
16/22
cases of a hexadecimal escape sequence and an ASC II
character occurring inside the input string.
One DU pair might belong to many different execution
paths. The All DU pat h s adequa c y c riteri o n extends the all
DU pairs criterion by requiring each simple (non looping)
DU path to be traversed at least once, thus including the
different ways of pairing definitions and uses. This can
reveal a fault by exercising a path on which a definition of a
variable should have appeared but was omitted.
A test suite T for a program P satisfies the all DU paths
adequacy criterion iff, for each simple DU path dp of P ,
there exists at least one test case in T that exercises a path
that includes dp.
The corresponding coverage measure is the fraction of
covered simple DU paths:
The all DU pair coverage C DU pat h s of T for P is the fraction
of simple DU paths of program P executed by at least one
test case in T .
-
8/6/2019 STUnit-6
17/22
The test suite T DU pairs does not satisfy the all DU paths
adequacy criterion, since both DU pairs 37,37 for variable eptr and 36,23 for variable dptr correspond
each to two simple DU paths, and in both cases one of the
two paths is not covered by test cases in T DU pairs . The
uncovered paths correspond to a test case that includescharacter + occurring within the input string (e.g., test case
T C DU pat h s = "test+case").
Although the number of simple DU paths is often quite
reasonable, in the worst case it can be exponential in the size
of the program unit. This can occur when the code between
the definition and use of a particular variable is essentially
irrelevant to that variable, but contains many control paths,
as illustrated by the example in Figure 13.2 : The code
between the definition of ch in line 2 and its use in line 12
does not modify ch, but the all DU paths coverage criterion
would require that each of the 256 paths be exercised.
-
8/6/2019 STUnit-6
18/22
1
2 void countBits( char ch) {
3 int count = 0;
4 if (ch & 1) ++count;
5 if (ch & 2) ++count;
6 if (ch & 4) ++count;
7 if (ch & 8 ) ++count;
8 if (ch & 16) ++count;
9 if (ch & 32) ++count;
10 if (ch & 64) ++count;
11 if (ch & 12 8 ) ++count;
12 printf("'%c' (0X%02x) has %d bitsset to 1 \ n",
13 ch, ch, count);
14 }
Figure 13.2: A C procedure with a large number of DU paths.
The number of DU paths for variable ch is exponential in the
number of if statements, because the use in each increment
-
8/6/2019 STUnit-6
19/22
-
8/6/2019 STUnit-6
20/22
there exists at least one test case in T that exercises a DU
pair that includes def .
The corresponding coverage measure is the proportion of
covered definitions, where we say a definition is covered
only if the value is used before being killed:
The all definitions coverage C Def of T for P is the fraction of
definitions of program P covered by at least one test case in
T .
-
8/6/2019 STUnit-6
21/22
13 .4 Data Flow Coverage with Complex Structures
L ike all static analysis techniques, data flow analysis approximates the effects of program executions. I t suffersimprecision in modeling dynamic constructs, in particular dynamic access to storage, such as indexed access toarray elements or pointer access to dynamically allocated storage. We have seen in Chapter 6 (page 94) that the
proper treatment of potential aliases involving indexes and po inters depends on the use to which analysis results
will be put. For the purpose of choosing test cases, some risk of underestimating alias sets may be preferable togross overestimation or very expensive analysis.
The precision of data flow analysis depends on the precision of alias information used in the analysis. Aliasanalysis requires a trade-off between precision and computational expense, with significant overestimation of alias sets for approaches that can be pract ically applied to real programs. I n the case of data flow testing,imprecision can be mitigated by specializing the alias analysis to identification of definition-clear paths betweena potentially matched definition and use. We do not need to compute aliases for all possible behaviors, but onlyalong particular control flow paths. The risk of underestimating alias sets in a local analysis is acceptableconsidering the application in choosing good test cases rather than offering hard guarantees of a property.
I n the cgi decode example we have made use of information that would require either extra guidance from thetest designer or a sophisticated tool for data flow and alias analysis. We may know, from a global analysis, thatthe parameters encoded and decoded never refer to the same or overlapping memory regions, and we may infer that initially eptr and dptr likewise refer to disjoint regions of memory, over the whole range of values that thetwo pointers take. L acking this information, a simple static data flow analysis might consider *dptr a potentialalias of *eptr and might therefore consider the assignment *dptr = *eptr a potential definition of both *dptr and*eptr. These spurious definitions would give rise to infeasible DU pairs, which produce test obligations that cannever be satisfied. A local analysis that instead assumes (without verification) that *eptr and *dptr are distinctcould fail to require an important test case if they can be aliases. Such underestimation may be preferable tocreating too many infeasible test obligations.
A good alias analysis can greatly improve the applicability of data flow testing but cannot eliminate all problems. Undisciplined use of dynamic access to storage can make precise alias analysis extremely hard or impossible. For example, the use of pointer arithmetic in the program fragment of Figure 13.3 results in aliasesthat depend on the way the compiler arranges variables in memory.
1 void pointer abuse() {2 int i=5, j=10, k=20;3 int *p, *q;4 p=&j+1;5 q=&k;6 *p = 30;7 *q=*q+55;8 printf("p=%d, q=%d \ n", *p, *q);9 }
Figure 13.3: Pointers to objects in the program stack can create essentially arbitrary definition-use associations,particularly when combined with pointer arithmetic as in this example.
13 .5 The Infeasibility Problem
-
8/6/2019 STUnit-6
22/22
Not all elements of a program are executable, as discussed in Section 12.8 of Chapter 12 . The path-orientednature of data flow testing criteria aggravates the problem since infeasibility creates test obligations not only for isolated unexecutable elements, but also for infeasible combinations of feasible elements.
Complex data structures may amplify the infeasibility problem by adding infeasible paths as a result of aliascomputation. For example, while we can determine that x[i] is an alias of x[j] exactly when i = j, we may not beable to determine whether i can be equal to j in any possible program execution.
Fortunately, the problem of infeasible paths is usually modest for the all definitions and all DU pairs adequacy
criteria, and one can typically achieve levels of coverage comparable to those achievable with simpler criterialike statement and branch adequacy during unit testing. The all DU paths adequacy criterion, on the other hand,often requires much larger numbers of control flow paths. I t presents a greater problem in distinguishingfeasible from infeasible paths and should therefore be used with discretion.
O pen Research Issues
Data flow test adequacy criteria are close to the border between techniques that can be applied at low cost withsimple tools and techniques that offer more power but at much higher cost. While in principle data flow testcoverage can be applied at modest cost (at least up to the all DU adequacy criterion), it demands moresophisticated tool support than test coverage monitoring tools based on control flow alone.
Fortunately, data flow analysis and alias analysis have other important applications. I mproved support for dataflow testing may come at least partly as a side benefit of research in the programming languages and compilerscommunity. I n particular, finding a good balance of cost and precision in data flow and alias analysis across
procedure boundaries (interprocedural or "whole program" analysis) is an active area of research.
The problems presented by pointers and complex data structures cannot be ignored. I n particular, modernobject-oriented languages like Java use reference semantics - an object reference is essentially a pointer - and soalias analysis (preferably inter- procedural) is a prerequisite for applying data flow analysis to object-oriented
programs.