Week6 testing-intro

C om b ining the s tre ngths of U MIS T and

The Victoria U nive rs ity of Manche s te r

COMP23420 Sem 2 week 6Software testing concepts

John Sargeant

[email protected]

REMINDER: PLEASE ENSURE THAT YOUR PHONE IS SWITCHED OFF

mailto:[email protected]

C om b in ing the s tre ngths of U MIS T and

The Victoria U nive rs ity of Manche ste r

Overview

• Testing strategy

• Practical issues

• Safety critical systems

• Basic testing techniques

• Kinds of testing



This software has bugs in…Embury’s law: “This software has bugs in, we just don’t

know what they are yet”.

Applies to any SW system of any size, e.g.

• Eurofighter fuel control system 80K lines

• Modern airliner ~10M lines

• ABC ~150K (our code) + ~300K (Java library code)

Also, unlike e.g. rostering, most real systems are concurrent and reactive not simple functions from inputs to outputs.



Testing to failThe fundamental rule of testing: a successful test is one

which causes the software to fail.

Difficult for programmers to do with their own code – not the same as debugging.

• Traditional solution: have a separate testing team who are as nasty to the software as possible (see Software testing, Ron Patton, Sams 2006 for a guide to this approach).

• Agile solution – write the tests before the code – also helps to clarify requirements.



How many bugs?Traditional estimate is 3-5 per hundred lines of C code.

That’s 30,000 – 50,000 for a million line program!

• Of those ~90% will probably be found by routine debugging

• And ~90%of the rest by sensible testing

• And ~90% of the rest by really rigorous testing

But that still leaves 30-50, and getting to 99.9% is very expensive.



Factors affecting bug density• Design – good designs lead to fewer bugs

• Type of application: concurrent, reactive systems are much more difficult than sequential transformation systems.

• Programming language: Java < C < Perl

• Programmer competence and experience

• In general expect 1-10 bugs per hundred lines

• (Probably a lot less for pair programming, but at a factor of 2 cost).



When you can’t afford 30-50 bugsMajor issue for safety critical systems, e.g. fly-by-wire.

• Conventional aircraft: pilot input directly controls flight surfaces (via hydraulics in large aircraft).

• FBW aircraft: a computer interprets the pilot’s inputs, and relays these electronically to the flight surfaces

• First used in the F16, 1974. Allows military aircraft to be inherently unstable, hence more manoeuvrable (also helps stealth).

• First civilian application Airbus A320, early 1980s. Provides protection and reduces pilot workoad.



Triple redundancy (1)• Airbus claimed that the A320 FBW software was

designed to fail no more than once in 109 flight hours. How could they possible claim that?

• Once in 104.5 (33K) hours might be plausible – but nowhere near enough.

• But add a second computer, with different software written by a different team. Now (in theory) both will fail at the same time once in 109 hours.

• But if there is a discrepancy you don’t know which one’s wrong – so you need a third computer and take a majority vote: triple redundancy.



Triple redundancy (2)Other advantages:

• Provides graceful degradation: don’t have to revert to manual control immediately with one computer out.

• Gives continuous testing for free – each discrepancy reveals a bug! So eventually the system should be extremely reliable – recent safety record of the A320 is outstanding.

• Redundancy is essential to providing reasonable levels of safety in complex safety-critical systems.

• Note: the actual A320 system has a lot more redundancy than described above



Quiz(1)

Suggest at least three reasons why the theoretical sum 104.5 x 104.5 = 109 may not reflect what happens in practice. Hints:

• Is there still a single point of failure in the system?

• Remember: not all bugs are in the actual software

• When is it true that P(A and B) = P(A) x P(B)?



Exhaustive testing is impossiblepublic static double divide(double a, double b){

return a / b;

}

A Java double is 64 bits so there are 2128 possibilities – intractable.

Similarly a reactive system such as FBW has a huge number of possible (state, input, time) combinations.

So we have to find a large number of bugs within a huge search space – we have to focus effort on the most “interesting” parts of that space.



Testing as a search problem• Equivalence partitioning: split up the space into areas

where similar tests are likely to lead to similar results

• e.g. if 2/3 works then 3/5 probably works too (but not necessarily 3/3

• Boundary value analysis: concentrate on boundaries between different parts of the space.

• E.g. b == 0, b very close to 0, a and/or b close to MAX_DOUBLE etc.



Black box or white box?Black box: testing the software against its spec without

access to the code:

• Means tests will be written without preconceptions about how the code works.

• If the code is changed, the same tests are still valid.

White box: testing with access to the code:

• Allows more tests to be done

• Allows tester to apply pressure to those places which look most likely to break.



Quiz(2)

2. You are testing an algorithm which sorts strings into alphanumeric order for a dictionary program. Suggest some of the most important tests you’ll need to do.

3. You are asked to thoroughly test floating point division software which will be burnt into a processor chip. Would you go about this primarily though black box or white box testing, and why?



Kinds of testing• Unit testing – test one unit (in OO one class) at a

time.

• Integration testing – test that the components of a system (or subsystem) work together correctly

• Regression/smoke testing – check that you haven’t broken it.

• System testing – test that the system works in the context in which it will be required to work.

• Alpha and beta testing – test with real users

• Acceptance testing – get the customer to come up with the dosh.



Unit testing• Testing one unit – class - at a time.

• Relatively simple, but the class you’re testing will usually rely on other classes.

• In general almost all software relies on other software (e.g. Java library classes).

• The search space is generally well defined so techniques like EP ad BVA are most useful here.

• Often possible to be systematic and reasonably confident that a single class is bug-free.

• In Java, often done with the JUnit testing framework.



Integration testing• Testing that the components of a system work

together.

• Harder to define than unit testing; shape of testing space is less obvious.

• Concentrate on important mission-critical features

• Check that the use cases can be performed without problems.

• Don’t get upset when your code causes a problem; don’t get annoyed when somebody else’s does.



Regression/smoke testing

• Regression testing: repeat the tests you did before, to make sure you haven’t broken anything.

• Especially important after significant changes but the more often the better.

• Smoke testing: repeat the most critical tests as often as possible – check that it’s not going up in smoke.

• Integration and Regression/smoke testing are often done in the form of a daily (or nightly) “build” of the system – requires that at least some tests are automated.



System testing• Testing of the system in the context(s) in which it will

operate.

• This will generally be a lot more varied than the context in which it was developed.

• May involve different hardware, operating systems, performance issues etc.

• Need to check the documentation and procedures as well as the code.

• Many systems which (seem to) work perfectly in a development environment fail in a customer environment.



Alpha and beta testing• System testing with real users

• Important because they don’t use the software the way you assume.

• Alpha testing: a small group of users, done with SW developers present.

• Beta testing: a wider group, remote from the development team, asked to submit bug reports.

• Better not to start Beta testing until the software will work for most of the users most of the time!



Acceptance testingWhere a SW product has a small number of large

customers (e.g. Campus Solutions) the customer(s) may specify a set of tests which the SW must pass before they will accept it and pay the dosh. There are some serious issues with this:

• In general, users don’t really know in advance what they want (the “waterfall fallacy”).

• Who within the customer organisation defines the spec? e.g. Managers and end users will have different views.

• Fixating on passing the acceptance test could result in serious problems being missed.

Week6 testing-intro

Technology

Transcript of Week6 testing-intro