[IEEE 2011 IEEE 14th International Multitopic Conference (INMIC) - Karachi, Pakistan...

6
Improving Path Selection by Handling Loops in Automatic Test Data Generation Sajjad Naghdali Zanjani Computer deparent, Islamic Azad University Qazvin branch, Tehran, an [email protected] Mehdi Dehghan Takht ladi Computer dent, Amirkabir University, Tean, Iran [email protected] Amir Bagheri Aghababa Computer depent, Islamic Azad University Tehran East Brch, Tehran, amir _ baqeri _ [email protected] Absact -Generating path oented test da is one of e most poweul methods in generating appropriate test data which selects all complete paths in Control Flow Graph (CFG) and generates appropriate data traverse the selected paths. path selecting phase, different paths could be selected according to loops iteraon that most of them are infeasible. Because the number of loops iteration is detected dynamically through the progm execution in most cases. earlier techniques, researchers either refused to handle loops or dealt with them by simplifying; us, no effective soluons have been represented up to now. paths with loops, proposed algorithm firstly attempts to determine the exact number of loops iteration. Then if the iterations remain unknown, this number will be decided by the tester. This technique executed based on symbolic evaluaon and loop information. Fally, selected pas can all be traversed; moreover, with reducing the number of infeasible paths, the time of generang test data will be reduced remarkably. Kwos-software tesng ; conol flow graph; path selectn; infea sib pa; free loop; dendt loop I. TRODUCTION Soe testing is a remarkable technique assure e quality of soe; however, it is an expensive d labor- intensive sk which accounts for approximately 50% of the total cost of soare development [1,3]. order to test soe, test data has to be generated. Generating test data mually is slow, expensive d requires exhausve effort. If e testing process could be automated, the cost of developing soare should be reduced significantly [2,9]. Generally, test data generation is the process at attempts to identi a set of test data that can averse most of the pgr paths [10]. There e ree tes of test data generators: pawise test data generators, data specificaon generators and random test data generators. This per focuses on pathwise test data generators which e tools that acct as input a computer pro and a testing criterion (e.g. total pa coverage, statement coverage, brch coverage, etc.) and automatically generate test data that meet the selected criterion [12,14]. 978-1-4577-0657-8/11/$26.00 © 2011 IEEE 273 The basic operation of e pawise generators consists of followg steps: progr conol flow graph (CFG) consction, path selecon and test data generaon [6,7]. this article the main focus is on the pa selector. Thus, it is assumed at e CFG consucon d test data generator exist and work properly. The pa selector automacly identifies set of paths to satisfy selected testing criterion. A. Backgund A conol flow graph of a program P is a directed graph G= E, s, e ) consisting of a set of nodes N d a set of edges E = {(n,m) I n,m EN} connect the nodes. each flow graph there are two special nodes: one eny d one exit node, s d e reecvely. Fig. 1 indicates bubble sort algorithm and Fig. 2 represents the corresponding conol flow graph [16]. Each node is either a basic block or a decision node. A basic block is a mum sequence of program statements such that if y statement of the block is executed, all statements in e block will be executed. A decision is a point in a program where conol flow can diverge. IF, FOR, WHILE d SWITCH statements e decision points. edge between two nodes n d m coesponds to a possle ansfer om n to m. All edges e leled with a condition. If (nj,nj+l)î en averse om nj to nj+], the condition of corresponding edge must be satisfied. If n is a basic block, en it has only one exit edge, d also its condition is always e. In Fig. 2, nodes 1,3,6,7,8 e basic block nodes and 2,4,5 are decision nodes. A high level description of these nodes is indicated in table I. First column in this table shows nodes number, d if e node is a basic block, corresponding statement will be sho in e second colu. (e.g. e statement I = 0 has been menoned for node number 1). Otherwise, correonding condition would be rrested for decision nodes (e.g. e condion i < n-l relates to node 2) d in e ird column, next possible node to averse is exhibited wi its condion. For instce, aſter node 1, node 2 must be aversed. Also node 3 will be visited aſter node 2 if the consaint i<n-l is satisfied, oerwise node 9 will be traversed.

Transcript of [IEEE 2011 IEEE 14th International Multitopic Conference (INMIC) - Karachi, Pakistan...

Page 1: [IEEE 2011 IEEE 14th International Multitopic Conference (INMIC) - Karachi, Pakistan (2011.12.22-2011.12.24)] 2011 IEEE 14th International Multitopic Conference - Improving path selection

Improving Path Selection by Handling Loops

in Automatic Test Data Generation

Sajjad Naghdali Zanjani

Computer department, Islamic Azad University Qazvin

branch, Tehran, Iran [email protected]

Mehdi Dehghan Takht fuladi

Computer department, Amirkabir University,

Tehran, Iran [email protected]

Amir Bagheri Aghababa

Computer department, Islamic Azad University Tehran East

Branch, Tehran, Iran amir _ baqeri _ [email protected]

Abstract-Generating path oriented test data is one of the most powerful methods in generating appropriate test data which selects all complete paths in Control Flow Graph (CFG) and generates appropriate data to traverse the selected paths. In path selecting phase, different paths could be selected according to loops iteration that most of them are infeasible. Because the number of loops iteration is detected dynamically through the program execution in most cases. In earlier techniques, researchers either refused to handle loops or dealt with them by simplifying; thus, no effective solutions have been represented up to now. In paths with loops, proposed algorithm firstly attempts to determine the exact number of loops iteration. Then if the iterations remain unknown, this number will be decided by the tester. This technique is executed based on symbolic evaluation and loop information. Finally, selected paths can all be traversed; moreover, with reducing the number of infeasible paths, the time of generating test data will be reduced remarkably.

Keywords-software testing ; control flow graph; path selection; infea sible path; free loop; dependent loop

I. INTRODUCTION

Software testing is a remarkable technique to assure the quality of software; however, it is an expensive and labor­intensive task which accounts for approximately 50% of the total cost of software development [1,3]. In order to test software, test data has to be generated. Generating test data manually is slow, expensive and requires exhaustive effort. If the testing process could be automated, the cost of developing software should be reduced significantly [2,9].

Generally, test data generation is the process that attempts to identify a set of test data that can traverse most of the program paths [10]. There are three types of test data generators: pathwise test data generators, data specification generators and random test data generators. This paper focuses on pathwise test data generators which are tools that accept as input a computer program and a testing criterion (e.g. total path coverage, statement coverage, branch coverage, etc.) and automatically generate test data that meet the selected criterion [12,14].

978-1-4577-0657-8/11/$26.00 © 2011 IEEE 273

The basic operation of the pathwise generators consists of following steps: program control flow graph (CFG) construction, path selection and test data generation [6,7]. In this article the main focus is on the path selector. Thus, it is assumed that the CFG construction and test data generator exist and work properly. The path selector automatically identifies set of paths to satisfy selected testing criterion.

A. Background

A control flow graph of a program P is a directed graph G=(N, E, s, e) consisting of a set of nodes N and a set of edges E = {(n,m) I n,m EN} connect the nodes. In each flow graph there are two special nodes: one entry and one exit node, s and e respectively. Fig. 1 indicates bubble sort algorithm and Fig. 2 represents the corresponding control flow graph [16].

Each node is either a basic block or a decision node. A basic block is a maximum sequence of program statements such that if any statement of the block is executed, all statements in the block will be executed. A decision is a point in a program where control flow can diverge. IF, FOR, WHILE and SWITCH statements are decision points. An edge between two nodes n and m corresponds to a possible transfer from n to m. All edges are labeled with a condition. If (nj,nj+l)EE then to traverse from nj to nj+], the condition of corresponding edge must be satisfied. If n is a basic block, then it has only one exit edge, and also its condition is always true.

In Fig. 2, nodes 1,3,6,7,8 are basic block nodes and 2,4,5 are decision nodes. A high level description of these nodes is indicated in table I. First column in this table shows nodes number, and if the node is a basic block, corresponding statement will be shown in the second column. (e.g. the statement I = 0 has been mentioned for node number 1). Otherwise, corresponding condition would be represented for decision nodes (e.g. the condition i < n-l relates to node 2) and in the third column, next possible node to traverse is exhibited with its condition. For instance, after node 1, node 2 must be traversed. Also node 3 will be visited after node 2 if the constraint i<n-l is satisfied, otherwise node 9 will be traversed.

Page 2: [IEEE 2011 IEEE 14th International Multitopic Conference (INMIC) - Karachi, Pakistan (2011.12.22-2011.12.24)] 2011 IEEE 14th International Multitopic Conference - Improving path selection

void bubbleSort(int a[ ], int n) {

for(int i=O; i< n-I; itt) for(int j=n-I; j>i ; j--)

if( a[j] < a[j-I]) swap( a[j], a[j-I]) ;

Figure I. Bubble sort algorithm

Figure 2. Control flow graph of bubble sort algorithm

Paths in CFG are defined as p=ni,ni+I, ... ,nj where nkEN and (n",nk+l) E E. A complete path starts from entry node s

and finishes in exit node e. If program F with input x traverses path p, then path p will be traversable by F(x). Path p is said to be feasible if there exists xEDF where F(x) can traverse specific path p, otherwise path p is called infeasible. In other words, path p is infeasible when its constraints cannot be satisfied. It is worthy to mention that determining infeasible paths is time consuming and their exact recognition is only manually possible [11].

Loops are commonly used in programs structure. They are categorized in two distinct classes in this article. In the first class called free loops, number of loops iteration is unknown and dependent on input variables. In the second class called dependant loop, number of loops iteration can be determined in accordance with executed program code. Although dependant loops have particular number of iterations, they cannot be calculated statically before the execution of program and they can only be determined through dynamic methods and symbolic execution of a program.

TABLE 1. BUBBLE SORTCFG DESCRIPTION

Node Node Expression Next node (Condition) number

1 i-O 2(true) 2 i<n-I 3(i<n-1 ),9(i >-n-l ) 3 j=n-l 4(true) 4 j>i 5(j>i),8(j<=i) 5 a[j]<a[j-l ] 6( a[j]<a[j -I]),

7(a[j]>-a[j-I]) 6 Swap(a[j],a[j-ID 7(true) 7 j-- 4(true) 8 itt 2(true) 9 End node

274

II. PATH SELECTION

Path selection is a process which selects the complete paths from CFG and passes them to data generator; also adequate inputs are generated in data generator to traverse the selected paths. Selection of these paths has a significant effect on optimizing test data generation [8,11,16]. Unknown number of loops iteration in CFG causes unlimited set of paths to be generated, because loops iteration cannot be determined prior to the execution of program. Therefore, with each iteration a new different path will be created. Following methods can be used in order to limit the number of selected path.

• "Reference [15] suggests that in paths with loops, the tester manually selects the paths." This method can reduce the number of faults, but it is completely against the process of automating test data generation, and it also needs exhaustive efforts and a long period of time.

• "Reference [13] offers a method in which loops are executed only one time and the other iterations are ignored." There are two fundamental problems in this method. First, if the given loop is dependant, it usually iterates more than once. Subsequently, the produced paths will be infeasible. Second, neglecting the real number of loops iterations is in fact, disregarding a part of program code, so generated test data cannot cover the entire program code.

• "Reference [4,16] represents a method in which the tester selects the number of loop iterations as a constant k." This method cannot deal with dependant loops due to their predetermined number of iterations and their dependencies on executed codes of program. Consequently, the number of dependant loops iterations cannot be determined through this method. On the other hand, this method is appropriate for free loops if their iteration depend on different inputs. Because, if two free loops depend on one specific input, with selecting iteration for each of these loops, the other loop iteration will be determined autonomously and tester cannot interfere in selecting. In these cases, by specifYing the first loop iteration, the other loop will be treated as a dependant loop, and if tester tries to specify dependant loops iteration, infeasible paths will be created unexpectedly.

As an example, the CFG in Fig. 2 contains two loops that if tester selects one iteration for outer loop and zero iteration for inner loop, then path 1,2,3,4,8,2,9 will be selected. This path is infeasible because the outer loop iterates one time, so n=2. On the other hand, the inner loop does not iterate, so n<2. These values of n are inconsistent. (These values are derived according to table I).

This article represents an algorithm which is called Loops Iteration Determiner (LID). The LID selects paths

Page 3: [IEEE 2011 IEEE 14th International Multitopic Conference (INMIC) - Karachi, Pakistan (2011.12.22-2011.12.24)] 2011 IEEE 14th International Multitopic Conference - Improving path selection

with loops and avoids selecting infeasible paths VIa detennining appropriate number of loop iterations. Therefore, the cost of test data generation can be reduced significantly. In this method, first LID determines the type of loop. If it is a dependent loop, its iteration will be determined through symbolic evaluation and if it is a free loop, number of iteration will be decided by the tester, and finally, with determining loops iteration suitable paths will be selected.

In order to determine loops type and prevent the selection of infeasible path, some default rules in software development must be followed which are explained in next section, and also characteristics of CFG are represented based on these defuult rules. Further, symbolic evaluation procedure is mentioned and the LID algorithm which is formed based on Symbolic Evaluation procedure is described as a pseudo code. All the stages of the algorithm are executed in one special example to make a clear view of its concept. Unfortunately, in earlier methods, paths that contain loops were handled manually or solved by simplification; thus, our new method cannot be compared with its earlier versions. According to this fact, the evaluation of this algorithm has been derived based on time and space complexity.

A. Difault Rules

For an effective loop control, developing software should follow some rules. Fortunately, these rules are in agreement with structural programming principles, and they are as follows:

• The syntax "goto" should not be used. • The syntax "break" should not be used. (Instead,

exit condition must be appeared as a constraint in "while" or "for" syntax.)

• Using ''while'' structure instead of "do-while" structure.

These are C language statements. If other programming languages are used, the same statements must be avoided.

B. Characteristics ofCFG

Following above assumptions and the essence of programming algorithm cause CFG to have special characteristics in comparison with other graphs. These characteristics are as follows:

• In each loop, there is only one node to exit called decision node in this article.

• The decision node has exactly two branches. One of these branches leads the program flow to a node that causes the flow to traverse the rest of the program. This node is called exit node. The other branch returns to the loop through a node called the entry node.

• Each decision node relates to one loop only. So, two loops cannot have same decision node.

• Two loops can have following status toward each other:

o There is no overlapping among loops and each loop has its distinct nodes.

o A loop exists inside another loop.

275

Thus, two loops never have same nodes unless one of the loops exists inside the other one.

• Each loop has a last node and after visiting this node the program flow returns to the decision node. Obviously, prior to the last node observation decision node must be visited first.

C. Symbolic Evaluation

Symbolic evaluation is one of the program analysis methods that represents the value of each variable as a symbolic expression such as input values or constants in each part of a traversed path [5]. This method is performed without the actual execution of a program. Symbolic evaluation can be illustrated as a table in which each variable is located in one entry. The length of current path denotes the number of columns such that the l'h column header illustrates the first i traversed nodes of the current path. Each cell value that is specified in accordance with its row and column indicates the corresponding variable value at the end of the related path.

Table II is an example of symbolic evaluation procedure for the path 1, 2, 3 in Fig. 2. In this table, the value ofj at the end of the path 1,2,3 is n-1, and the value of i at the end of path 1,2 is O.

D. Data Structure of Loop

Before the explanation of the algorithm, all loops in CFG should be detected and saved in the data structure which is represented as a pseudo code in Fig. 3. The time complexity of loop detection function is O(v\ where v denotes the number of nodes. The represented data structure for each node contains following fields:

• Loop ID : Each loop has a unique identifier which is exhibited by a number.

• Decision node, exit node, entry node and last node are described in section 2-2.

• Current iteration : This field specifies the number of loop iterations in current path.

• Loop type: Each loop can be a free or a dependant loop; however, its type can be unknown.

• Loop constraint: The condition of loop that is appeared in decision node.

• Maximum iterations: This number is specified by the tester and is used to determine the number of free loops iteration.

• Variable list: A list of variables that are appeared in loop constraint.

• Input variable: Loops iterations depend on this input variable which is used exclusively for free loops.

Table III illustrates data structure of loops in Fig. 2 .

Page 4: [IEEE 2011 IEEE 14th International Multitopic Conference (INMIC) - Karachi, Pakistan (2011.12.22-2011.12.24)] 2011 IEEE 14th International Multitopic Conference - Improving path selection

TABLE II. SYMBOLIC EXECUTION TABLE IN BUBBLE SORT CFG

� Variables 1

I 0 J unknown N n

struct loop {

loopld f- identifier of current loop nodDecisionf- decision Node ofloop nodExitf- exit node ofloop nodEntry f- entry node ofloop nodLast f- last node of loop

1,2

0 Unknown

n

curIteration f- number of current iteration

1,2,3

0 n-I N

100pType f- types ofloops are: 0: unknown I: dependent 2: independent 100pConstraintf- a 100pConstraint that appeared in nodDecision maxIteration f- maximum number of iteration that is determined by the tester for free loops IstVariables f- list of all variables that are appeared in nodDecision varlnput f-the input variable that free loops depend on it

}

Figure 3. Pseudo code ofloop data structure

TABLE III. DATA STRUCTURE OF WOPS IN BUBBLE SORT CFG

nodDecision=2 nodLast=8 nodEntry=3

curlteration =0 maxIteration - LoopType= unknown unknown

nodExit=9 IstVariables=i,n LoopConstraint = i<n-I varInput=unknown LoopID =L I

nodDecision=4 nodLast=7 nodEntry=5

curlteration =0 maxIteration= LoopType= unknown unknown

nodExit=8 IstVariables=

LoopConstrai nt=j >i j,i varInput=unknown LoopID=L2

III. THE LID ALGORITHM

In order to detect paths in CFG, the LID should be conducted by the DFS. In the DFS algorithm after visiting the loop decision node, the LID algorithm with its dynamic nature would decide which of the entry or exit node of the given loop should be visited to select a feasible path. If exit node is selected the loop will be finished, and if entry node is chosen the loop will be continued.

Each visited node in DFS is passed to LID function as an input. The return value of LID function is a node in CFG which specifies next node that must be visited by DFS in next stage. This node can be null or it can be either an entry or exit node. If the return value is null, the DFS will continue its routine, but if it is an entry/exit node of a loop, the DFS will ignore all the nodes except the returned node of the LID. Hence, LID determines the loop iterations by introducing next node to DFS.

LID uses following data structure: • tblSymbolicExe: The tblSymbolicExe data structure has been explained in

section 2-3 as a Symbolic Evaluation table.

276

• tbiinputs: The data structure of tbiinput is a two column table such

that the number of rows indicates the number of input variables of a program. First column in each row identifies an input variable and second column represents its value in current path. All the input variables are initialized with an undecided value, and also when the iteration number of a free loop depends on one of these input variables, it will be determined by specifying the value of the variable. Subsequently, the tblInputs would be updated with the new value of the given variable.

• 100pList : loop List represents a list of program loops. Each element

in this list displays a loop structure which is exhibited in section II. D.

• 100pStack: 100pStack is a stack of loops ID. The number of internal

loops iteration must be determined prior to external loops. Because, in order to visit the exit node of an external loop the exit node of the internal loop must be visited in advance. In visiting decision nodes, if the type of loop and the value of its input variable are unknown, the loop ID will be pushed in to the loop stack. Consequently, if there is an internal loop inside another loop, intemal loop information will be popped first. LID also uses following functions: • varInputDetector:

The varInputDetector is a function that receives two parameters as input : a loop variable list and tblSymbolicExe table. Then the value of these variables will be determined according to the tblSymbolicExe in current path. If these values are denoted based on input variables, it can be said that the given loop depends on the corresponding input variable. In this case, the loop input variable is assigned with this symbolic value. (LID method is not appropriate for free loops with dependencies on several input variables or combined constraints) • solveCondition:

As it is mentioned before, the number of free loops iteration is determined by the tester. To finish loop iteration loop exit node must be visited after loop decision node. The solveCondition function restates the loop constraints with symbolic expressions (input values and constants) from tblSymbolicExe in current path. Then the value of loops input variable must be set as it would be able to satisfy the loop constraint. The corresponding input variable would be updated with the new value in tbiinputs table. The pseudo code of LID algorithm is represented in Fig. 4.

Node LID (tblSymbolicExe, tblInputs, node, loopList)

II Assigns the Type of loop, and determines the loop's iteration by Ilintroducing the next node that must be visited { II inputs: tblSymbolicExe f- a table contains all variables and their values in every

step in current path tblInputs f- A table contains inputs and their values. all values are unknown initially node f- Current Node 100pList f- a list of loop objects Iioutput:

Page 5: [IEEE 2011 IEEE 14th International Multitopic Conference (INMIC) - Karachi, Pakistan (2011.12.22-2011.12.24)] 2011 IEEE 14th International Multitopic Conference - Improving path selection

Node� the next node that must be visited by DFS. Ilmain code:

foreach( L in loopList) { if(node == L.nodDecision)

{ if( L.LoopType= unknown) { varInputDetector(L.lstVariables, tblSymbolicExe)

if(read L.varlnput's value from tblInput == unknown ) {loopStack.push(L); return null;}

else L.LoopType = dependent;

}llend of if(L.LoopType== unknown) if(L.LoopType= dependent) { if(L.loopConstraint==true) Ilaccording to tblSymbolicExe

return L.nodEntry ; else return L.nodExit; } else

if( L.LoopType ==free) if( L.\oopConstraint ==undecided)

if( L.curIteration < L.maxIteration) { L.curIteration ++; return L.nodEntry;

else

{runs solveCondition to determine L.varlnput's value and updates tblInput; return L.nodExit; }

}I I end of if{node == L.nodDecision) LS=top _ elemenUn _ LoopStack;

if(node == LS.nodLast && LS.LoopType==unknown) { LoopStack.popO;

if(read LS. varlnput's value from tblInput=unknown) { LS.loopType=free;

reads K from tester to define LS. maxIteration; return null;} }

if(node=LS.nodExit && LS.LoopType=unknown) { run solve condition to determine L.varlnput's value

and update tblInput; } return null;

}

Figure 4. LID algorithm

For having a better view of the algorithm, one of the paths of bubble sort CFG in Fig. 2 is selected by LID algorithm.

The bubble sort CFG contains two loops which are detailed in table III. First, as it is mentioned earlier, the DFS algorithm traverses the CFG of a program. Therefore; the DFS algorithm visits node 1 and sends it as a first input to the LID. The LID algorithm reacts only when it meets decision, entry or exit nodes and returns null in other situations. Thus, the LID returns null after visiting node 1. According to the null value, no specific action would be taken by DFS and it follows its routine. The next node that is visited in DFS function is node 2. As node 2 is the decision node of L1 loop and L1 type is unknown, so the input variable of L1 must be determined. The varInputDetector function selects n as the input variable of L1. The value of n is unknown in tblInputs; thus, the L1 will be pushed into the stack and LID will assign n to input variable of L1(L1.varlnput=n). The return value of LID function is still null, so the DFS visits the next node routinely.

For the better explanation of LID function, it is assumed that the DFS selects node 3 in this stage which has the same situation as node 1 and LID returns null again. The next is node 4 which is the decision node of L2 loop, so in the same way as L1, L2 is also pushed into the stack, and its input

277

variable will be assigned with n (L2.varlnput=n). In next stage, nodes 5 and 6 are also visited. The return value of LID after visiting nodes 4,5,6 is still null. Until now, the path 1,2,3,4,5,6 has been traversed.

According to the DFS method, node 7 is the next node that must be visited which is the last node of L2 loop. Furthermore, L2 is the top element of the 100pStack and its type is unknown and also its input variable (n) is undecided in tblIputs. Thus, its type will be set as free and its iterations will be determined by tester which is assumed 2 in this example. This value will be saved in MaxIterations field of

L2 (The LID still returns null). Then, through the path node 4 is visited again which is L2 decision node and L2 type has been considered as free. Moreover, L2 has iterated one time which is less than L2.Maxlteration; therefore, the L2 iteration should be continued. In this case, the LID algorithm will return the entry node of L2 (node 5) as the return value. Subsequently, the DFS should only visit node 5 and refuse to traverse the other nodes. Up to this point, the path 1, 2, 3, 4, 5, 6, 7, 4, 5 has been traversed.

In the following, nodes 6 and 7 are traversed. When the LID meets node 4 for the second time the loop iteration will be finished and the program flow should exit from the loop. In this stage, the solveCondition function restates the constraints of L2 according to symbolic expressions( input variables and constants). In first iteration, the constraint j>i would be restated as n-1>O and in the second and third iterations constraints would be n-2>O and n-3<O respectively. According to these constraints, the value of n is assigned 3 which will be saved in tbIInput afterwards. Also, the return value of LID is the exit node of L2 ( node 8).

The loop type and current iteration of L2 are initialized again when the LID meets the exit node. The path 1, 2, 3, 4, 5, 6, 7, 4, 5,6,7,4,8 has been traversed until now.

Although node 8 is the last node of L1 (the top element of the loop stack), the type of L1 cannot be considered as free, because the input variable of L1 (n) has already been determined. In the following, node 2 which is the decision node of L1 is visited. L1 is known as a dependant loop in this point, so the LID return value is determined in accordance with loop constraint. Here is the loop constraint i<n-1 that can be restated based on tblSymbolicExe and tblInput as 1 <2.

Whereas the constraint of L1 is satisfied, the return value of LID is the entry node of L1 (node 3). On this occasion, decision node of L2 (node 4) would be visited. On the other hand, according to the value of L2 input variable (n) in tblInput, L2 type is considered as dependant. Therefore, when the decision node ofL2 is visited, the LID would select entry or exit node of L2 based on the satisfaction of L2 constraint. Whereas n=3, the constraint of L2 is 2> 1 at this time, so LID would return node 5 which is the entry node of L2. Until now, the path 1,2,3,4,5,6,7,4,5,6,7,4,8,2,3,4 has been traversed. With following this routine, the constraints of the loop in nodes 4 and 2 would be appeared as 1> 1 and 2>2 respectively. By finishing L2 and L1 , a complete and feasible path would be detected which consist of following sequence of nodes

1,2,3,4,5,6,7,4,5,6,7,4,8,2,3,4,5,6,7,4,8,2,9.

Page 6: [IEEE 2011 IEEE 14th International Multitopic Conference (INMIC) - Karachi, Pakistan (2011.12.22-2011.12.24)] 2011 IEEE 14th International Multitopic Conference - Improving path selection

A. Time and Space Complexity

This procedure will be executed with visiting each node in CFG. The time complexity of this procedure in visiting one node is 0(1+ v + i) in worst case, where I , v and i are number of loops, number of variables and number of input variables in main program respectively. Above values are so small in most of modules; therefore, time limitation would not be imposed on process of each node in previous methods. The updating time of the symbolic execution table is not mentioned in this article because it exists in earlier techniques.

In space complexity, required space for saving tblSymbolicExe and tbllnputs are O(v*p) and 0(2*i) respectively, where p denotes maximum length of the path. As it is stated, the time complexity of the LID is appropriate and acceptable.

N. CONCLUSION

This article represents a new approach to select paths with loops such that the selected paths are feasible and executable. This dynamic method through traversing the CFG and using Symbolic evaluation procedure can determine the loops type and their iterations. This method can reduce the costs of automatic software testing significantly by avoiding infeasible paths.

Future works can be developed in two directions. First, the LID algorithm cannot handle and control loops with combined constraints. Therefore, there is an opportunity to fmd solutions to deal with all kinds of loops. Second, the represented algorithm in this article only avoids infeasible paths that are created due to inappropriate selection of loops iteration number. Solutions that avoid selecting other kinds of infeasible paths can be represented in the future.

REFERENCES

[I] G. Myers, "The art of software testing", JohnWiley and Sons, second edition, 2004.

[2 ] I. Burnstein, "Practical software testing : a process-oriented approach", Springer, 2003.

[3] L. Sommerville, "Software engineering", Proceedings of the 24th international conference on Software engineering ICSE 02.Addison-Wesley,2007.

[4] M.H. Liu, Y.F. Gao , J.H. Liu. , L. Zhang and J.S. Sun, "An approach to test data generation for killing multiple mutants", IEEE -Computer Society, International Conference on Software Maintenance, 2006.

[5] B. Botella, A. Gotlieb and C. Michel, "Symbolic execution of floating point computations", the software testing, Verification and Reliability journal, vol.I6 , pp 97-121,2006.

[6] J. Zhang, C. Xu and X. Wang, "Path-Oriented test data generation using symbolic execution and constraint solving techniques", International Conference on Software Engineering and Formal Methods, pp.242-250, 2004.

[7] A. J. Offutt, Z. Jin and J. Pan, "The dynamic domain reduction procedure for test data generation" ,Software Practice And Experience, John Wiley & Sons, Vol. 29, pp. 167-193,1999.

[8] A. Kumar, S. Tiwari, K.K. Mishra and A.K. Misra, "Generation of efficient test data using path selection strategy with elitist GA in regression testing", Computer Science and Information Technology

278

(ICCSIT), 3rd IEEE International Conference on, pp 389 - 393, 2010.

[9] A. Bertolino, "Software testing research: achievements, challenges, dreams", IEEE Transactions on Software Engineering, pp 85-103, 2007.

[10] P. Bueno, M. Jino and W. E. Wong, "Diversity oriented test data generation using metaheuristic search techniques", Information Sciences, In Press, 2011.

[II] M. Ngo, and H. Tan, "Heuristics-based infeasible path detection for dynamic test data generation", Information and Software Technology, Elsevier, pp 641-655,2008.

(12 ] A. Gotlieb, T. Denmat and B. Botella, "Goal-oriented test data generation for programs with pointer variables", 29th Annual International Computer Software and Applications Conference COMPSAC05, IEEE, vol I, pp449-454,2005.

[13] R. Demilio and J. Offutt, "Constraint-based automatic test data generation", IEEE Transactions on Software Engineering, Vol. 17, pp. 900 -910, 1991.

[14] B. Korel, "Automated software test data generation", IEEE Transactions on Software Engineering vol. 16, pp 870--879, 1990.

[15 ] P. McMinn, "Search-based software test data generation: a survey", publication in Software Testing Verification and Reliability, John Wiley & Sons, vol. 14, pp 105-156,2004.

[16 ] J. Edvardsson, "A survey on automatic test data generation", Second Conference On Computer Science And Engineering In Linkoping- ECSEL,pp21-28, 1999.