bbst2_2004.ppt

7/25/2019 bbst2_2004.ppt

http://slidepdf.com/reader/full/bbst22004ppt 1/29

1Black Box Software Testing Copyright © 2003 Cem Kaner & James

Black Box Software Testing

Fall 2004by

Cem Kaner, !"!, #h!"!#rofessor of Software $ngineering

%lori&a 'nstit(te of Technology

an&

ames Bach#rincipal, Satisfice 'nc!

Copyright (c) Cem Kaner & James Bach, 2000-2004

This work is license& (n&er the Creati)e Commons *ttrib(tion+Share*like icense! To )iew a copy

of this license, )isit http-..creati)ecommons!org.licenses.by+sa.2!0. or sen& a letter to Creati)eCommons, // 1athan *bbott ay, Stanfor&, California 30/, 4S*!

These notes are partially base& on research that was s(pporte& by 1S% 5rant $'*+0663/3

'T7.S89#$- :'mpro)ing the $&(cation of Software Testers!: *ny opinions, fin&ings an& concl(sions

or recommen&ations expresse& in this material are those of the a(thor;s< an& &o not necessarily

reflect the )iews of the 1ational Science %o(n&ation!

http://www.testingeducation.org/




7/25/2019 bbst2_2004.ppt



Black Box Software Testing

Part 2

Complete Testing Is Impossible

7/25/2019 bbst2_2004.ppt



Complete testing?What do we mean by "complete testing"?

= Complete :co)erage:- Teste& e)ery line . branch . basis path>

= Testers not fin&ing new b(gs>

= Test plan complete>

= *fter all, if there are more b(gs, yo( can fin& them if yo( &o more

testing! So testing co(l&n?t yet be :complete!:

Complete testing must mean that, at the end of

testing, you know there are no remaining

unknown bugs.

7/25/2019 bbst2_2004.ppt



Complete coverage?Some people (try to) simplify away the problem of

complete testing by advocating "complete coverage."

= hat is co)erage>

– Extent of testing of certain attributes or pieces of the program, suchas statement coverage or branch coverage or condition coverage.

– Extent of testing completed, compared to a population of possibletests.

= Typical &efinitions are o)ersimplifie&! They miss, for example,

– Interrupts and other parallel operations

– Interesting data values and data combinations

– Missing code= The n(mber of )ariables we might meas(re is st(nning! '

;Kaner< liste& 606 examples in Software Negligence & TestingCoverage.

7/25/2019 bbst2_2004.ppt



Measuring and achieving high

coverage

Coverage measurement is a good tool to show how far you are from complete testing.

= B(t it@s a lo(sy tool for in)estigating how close yo( are to completion!

= "ri)ing testing to achie)e Ahigh co)erage is likely to yiel& a mass of

low+power tests!– People optimize what we measure them against, at the expense of what

we don’t measure.

= %or more on meas(rement &istortion an& &ysf(nction, rea& Bob

*(stin@s book, Measurement and Management of Performance in

Organizations.

– Brian Marick discusses this and other problems with this and several

other issues in his papers at www.testing.com (e.g. How to Misuse Code

Coverage). Marick has been involved in development of several of the

commercial coverage tools.



7/25/2019 bbst2_2004.ppt



What about bug find rates?Some people measure completeness of testing

with bug curves: = 1ew b(gs fo(n& per week ;:"efect arri)al rate:<

= B(gs still open ;each week<

= 7atio of b(gs fo(n& to b(gs fixe& ;per week<

7/25/2019 bbst2_2004.ppt

http://slidepdf.com/reader/full/bbst22004ppt 7/297Black Box Software Testing Copyright © 2003 Cem Kaner & James

Weibull reliability modelBug curves can be useful progress indicators,

but some people fit the data to theoretical curvesto determine when the project will complete.

The model’s assmptions

= Testing occ(rs in a way that is similar to the way the software will beoperate&!

= *ll &efects are e(ally likely to be enco(ntere&!

= *ll &efects are in&epen&ent!

= There is a fixe&, finite n(mber of &efects in the software at the start oftesting!

= The time to arri)al of a &efect follows the eib(ll &istrib(tion!

= The n(mber of &efects &etecte& in a testing inter)al is in&epen&ent ofthe n(mber &etecte& in other testing inter)als for any finite collection ofinter)als!

– See Erik Simmons, When Will We Be Done Testing? Software Defect Arrival Modelling with the Weibull Distribution.

7/25/2019 bbst2_2004.ppt


The Weibull modelI think it’s absurd to rely on a distributional

model (or any model) when every assumption itmakes about testing is obviously false.= Dne of the a&)ocates of this approach points o(t that

“Luckily, the Weibull is robust to most violations.”

– This illustrates the use of surrogate measures—we don’t have an

attribute description or model for the attribute we really want tomeasure, so we use something else, that is allegedly “robust”, in its

place. This can be very dangerous

– The Weibull distribution has a shape parameter that allows it to take a

very wide range of shapes. If you have a curve that generally rises then

falls (one mode), you can approximate it with a Weibull.

B4T E*T "D$S TE*T T$ 4S> ED SED4" $ '1T$7#7$T

'T>

7/25/2019 bbst2_2004.ppt


Side effects of bug curvesEarlier in testing, the pressure is to increase

bug counts. In response, testers will:

= 7(n tests of feat(res known to be broken or incomplete!

= 7(n m(ltiple relate& tests to fin& m(ltiple relate& b(gs!

= ook for easy b(gs in high (antities rather than har& b(gs!

= ess emphasis on infrastr(ct(re, a(tomation architect(re, tools an&more emphasis of b(g fin&ing! ;Short term payoff b(t long term

inefficiency!<

– For more on measurement dysfunction, read Bob Austin’s book,

Measurement and Management of Performance in Organizations.

– For more observations of problems like these in reputable

software companies, see Doug Hoffman's article, The Dark Side of

Software Metrics.

7/25/2019 bbst2_2004.ppt


Side effects of bug curvesLater in testing, the pressure is to decrease the

new bug rate:= 7(n lots of alrea&y+r(n regression tests!= "on@t look as har& for new b(gs!

= Shift foc(s to appraisal, stat(s reporting!

= Classify (nrelate& b(gs as &(plicates!

= Class relate& b(gs as &(plicates ;an& close&<, hi&ing key &ata abo(t

the symptoms . ca(ses of the problem!

= #ostpone b(g reporting (ntil after the meas(rement checkpoint

;milestone<! ;Some b(gs are lost!<

= 7eport b(gs informally, keeping them o(t of the tracking system!

= Testers get sent to the mo)ies before meas(rement checkpoints!= #rogrammers ignore b(gs they fin& (ntil testers report them!

= B(gs are taken personally!

= Fore b(gs are reGecte&!

7/25/2019 bbst2_2004.ppt


Bad models are counterproductive

7/25/2019 bbst2_2004.ppt


Testers live and breathe tradeoffsThe time needed for test-related tasks is

infinitely larger than the time available.

$xample- Time yo( spen& on

- analyzing, troubleshooting, and effectively describing a failure

's time no longer a)ailable for

- Designing tests - Documenting tests

- Executing tests - Automating tests

- Reviews, inspections - Supporting tech support

- Retooling - Training other staff

7/25/2019 bbst2_2004.ppt


Let's consider the nature of the

infinite set of tests

There are enormous numbers of possible tests.To test everything, you would have to:

= Test e)ery possible inp(t to e)ery )ariable ;incl(&ing o(tp(t

)ariables an& interme&iate res(lts )ariables<!

= Test e)ery possible combination of inp(ts to e)ery combinationof )ariables!

= Test e)ery possible se(ence thro(gh the program!

= Test e)ery har&ware . software config(ration, incl(&ing

config(rations of ser)ers not (n&er yo(r control!

= Test e)ery way in which any (ser might try to (se the program!

» Read The Impossibility of Complete Testing

7/25/2019 bbst2_2004.ppt


Inputs to individual variables

Consider the “valid” inputs

"o(g Eoffman worke& for F*S#*7 ;the Fassi)ely #arallel comp(ter,

HK parallel processors<! This machine is (se& for mission+critical an&

life+critical applications!

–To test the 32-bit integer square root function, Hoffman checked all values(all 4,294,967,296 of them). This took the computer about 6 minutes to run

the tests and compare the results to an oracle.

–There were 2 (two) errors, neither of them near any boundary. (The

underlying error was that a bit was sometimes mis-set, but in most error

cases, there was no effect on the final calculated result.) Without anexhaustive test, these errors probably wouldn’t have shown up.

–What about the 64-bit integer square root? How could we find the time to

run all of these? If we don't run them all, don't we risk missing some bugs?

7/25/2019 bbst2_2004.ppt


Inputs to individual variablesMore complex examples

= $aster $ggs– Bizarre inputs, by design

= $&ite& inp(ts

– These can be quite complex. How much editing is enough?

= Iariations on inp(t timing– Try entering the data very quickly, or very slowly. Enter them

before, after, and during the processing of some other event, or just as the time-out interval for this data item is about to expire.

= 1ow, what abo(t all the error han&ling that yo( can trigger with

:in)ali&: inp(ts>– Think about Whittaker & Jorgensen's constraint-focused attacks

(Whittaker, How Software Fails)

– Think about Jorgensen's hostile data stream attacks

7/25/2019 bbst2_2004.ppt


When people challenge extreme value

tests…

“No user would do that.”

“No user I can think of , who I like,

would do that on purpose.”!ho aren’t yo thin"ing o#$

!ho don’t yo li"e %ho might really se this prodct$

!hat might good sers do y accident$

7/25/2019 bbst2_2004.ppt


Combination testingVariables interact.

= $xample 6- a program crashe& when attempting to print pre)iew a high

resol(tion ;back then, H00xH00 &pi< o(tp(t on a high resol(tion screen! The

option selections for printer resol(tion an& screen resol(tion were interacting!

= $xample 2- *merican *irlines co(l&n@t print tickets if a string concatenating the

fares associate& with all segments was too long!

= $xample 3- Femory leak in or&Star if text was marke& Bol& . 'talic ;rather

than 'talic . Bol&<

= S(ppose there are 1 )ariables!

– Suppose the number of choices for the variables are V1, V2, through VN.

– The total number of possible combinations is V1 x V2 x . . . x VN. This is huge.= * fiel& that accepts only J6, 2, 3 an& another that accepts only J*, B, C yiel&s cases,

6*, 6B, 6C, 2*, 2B, 2C, 3*, 3B, an& 3C!

= Combine two fiel&s that accept one &igit ;0 to < each, yiel&s 60 x 60 L 600 possible

combinations!

= 36M,N,/H,000 possible combinations of the first fo(r mo)es in chess!

7/25/2019 bbst2_2004.ppt


Combination testingVariables interact.

S(ppose there are 1 )ariables!

= abel the n(mber of choices for the )ariables as I6, I2

thro(gh I1!

= The total n(mber of possible combinations is

I6 x I2 x ! ! ! x I1!

This is h(ge!

– A field that accepts only {1, 2, 3} and another that accepts only

{A, B, C} yields 9 cases, 1A, 1B, 1C, 2A, 2B, 2C, 3A, 3B, & 3C.

– Combine two fields that accept one digit (0 to 9) each, yields 10 x

10 = 100 possible combinations.

– 318,979,564,000 possible combinations of the first four moves in

chess.

7/25/2019 bbst2_2004.ppt


Sequences

*

B

C

"

$

%

5

E

'

O $O'T

P 20 times

thro(gh the

loop

Eere@s an example that shows that there are too many paths to

test in e)en a fairly simple program! This is from Fyers, The Art

of Software Testing.

7/25/2019 bbst2_2004.ppt



SequencesMyers’ example in pseudocode

The program starts at *!From A it can go to B or C

From B it goes to X

From C it can go to D or E

From D it can go to F or G

From F or from G it goes to XFrom E it can go to H or I

From H or from I it goes to X

From X the program can go to

EXIT or back to A. It can go back

to A no more than 19 times.'ne path is *BO+$xit There are %ays to get to * and then to the +*Tin one pass

nother path is *BO*C"%O+$xit There are %ays to get to * the #irsttime, more to get ac" to * the second time, so there are . / 2cases li"e this!

7/25/2019 bbst2_2004.ppt



SequencesAnalyzing Myers’ example

= There are /6 9 /2 9 !!! 9 /6 9 /20 L 6063 L 600 trillion paths

thro(gh the program!

= 't wo(l& take only a billion years to test e)ery path ;if one co(l&

write, exec(te an& )erify a test case e)ery fi)e min(tes<!

7/25/2019 bbst2_2004.ppt



Phone System: The Telenova

Stack Failure

Telenova Station Set 1. Integrated voice and data.108 voice features, 110 data features. 1985.

7/25/2019 bbst2_2004.ppt



The Telenova Stack Failure

Context-sensitive

display10-deep hold queue

10-deep wait queue

7/25/2019 bbst2_2004.ppt



The Telenova Stack FailureThe bug that triggered the simulation:

Beta c(stomer ;a stock broker< reporte& ran&om fail(res= Co(l& be fre(ent at peak times

= *n in&i)i&(al phone wo(l& crash an& reboot, with other phones crashing while thefirst was rebooting

= Dn a partic(larly b(sy &ay, ser)ice was &isr(pte& all ;$ast Coast< afternoon

e were mystifie&-

= *ll in&i)i&(al f(nctions worke&

= e ha& teste& all lines an& branches!

4ltimately, we fo(n& the b(g in the hol& (e(e

= 4p to 60 calls on hol&, each a&&s recor& to the stack

= 'nitially, the system checke& stack whene)er call was a&&e& or remo)e&, b(t this

took too m(ch system time! So we &roppe& the checks an& a&&e& these– Stack has room for 20 calls (just in case)

– Stack reset (forced to zero) when we knew it should be empty

= The error han&ling ma&e it almost impossible for (s to &etect the problem in thelab! Beca(se we co(l&n@t p(t more than 60 calls on the stack ;(nless we knew themagic error<, we co(l&n@t get to 26 calls to ca(se the stack o)erflow!

7/25/2019 bbst2_2004.ppt



The Telenova Stack FailureA simplified state diagram showing the bug

Callerhung up

Idle

Connected

On Hold

Ringing

You

hung up

7/25/2019 bbst2_2004.ppt



Telenova Stack Failure

Idle

Connected

On Hold

Ringing Callerhung up

You

hung up

7/25/2019 bbst2_2004.ppt



Telenova Stack FailureWhy are we spending so much time on this

example?

= Beca(se it ill(strates se)eral important points-

– Simplistic approaches to path testing can miss critical defects.

– Critical defects can arise under circumstances that appear (in a testlab) so specialized that you would never intentionally test for

them.– When (in some future course or book) you hear a new

methodology for combination testing or path testing, I want you totest it against this defect. If you had no suspicion that there was astack corruption problem in this program, would the new method

lead you to find this bug?

• This example also lays a foundation for our introduction to random /statistical testing.

7/25/2019 bbst2_2004.ppt



Telenova Stack Failure

Having found and fixedthe hold-stack bug,

should we assume

that we’ve taken care of the problem

or that if there is one long-sequence bug,

there will be more?

Hmmm…

If you kill a cockroach in your kitchen,

do you assumeyou’ve killed the last bug?

Or do you call the exterminator?

7/25/2019 bbst2_2004.ppt


29Bl k B S f T i C i h © 2003 Cem Kaner & James

Conclusion

= Complete testing is impossible– There is no simple answer for this.

– There is no simple, easily automated, comprehensive oracle to

deal with it.

– Therefore testers live and breathe tradeoffs.

bbst2_2004.ppt

Documents

Transcript of bbst2_2004.ppt