Acknowledgments:
description
Transcript of Acknowledgments:
![Page 1: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/1.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
1
An Overview of High Volume Test
Automation(Early Draft: Feb 24, 2012)
Cem Kaner, J.D., Ph.D.Professor of Software Engineering
Florida Institute of TechnologyAcknowledgments: Many of the ideas presented here were developed in collaboration with Douglas Hoffman.
These notes are partially based on research that was supported by NSF Grant CCLI-0717613 “Adaptation & Implementation of an Activity-Based Online or Hybrid Course in Software Testing.” Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
![Page 2: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/2.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
AbstractThis talk is an introduction to the start of a research program. Drs. Bond, Gallagher and I have some experience with high volume test automation but we haven't done formal, funded research in the area. We've decided to explore it in more detail, with the expectation of supervising research students. We think this will be an excellent foundation for future employment in industry or university. If you're interested, you should talk with us. Most discussions of automated software testing focus on automated regression testing. Regression tests rerun tests that have been run before. This type of testing makes sense for testing the manufacturing of physical objects, but it is wasteful for software. Automating regression tests *might* make them cheaper (if the test maintenance costs are low enough, which they often are not) but if a test doesn't have much value to begin with, how much should we be willing to spend to make it easier to reuse it? Suppose we decided to break away from the regression testing tradition and use our technology to create a steady stream on new tests instead. What would that look like? What would our goals be? What should we expect to achieve?This is not yet funded research--we are still planning our initial grant proposals. We might not get funded, and if we do, we probably won't get anything for at least a year. So, if you're interested in working with us, you should expect to support yourself (e.g. via GSA) for at least a year and maybe longer.
2
![Page 3: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/3.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
Typical Testing TasksAnalyze product & its risks
• Benefits & features• Risks in use• Market expectations• Interaction with external S/W• Diversity / stability of platforms• Extent of prior testing• Assess source code
Develop testing strategy• Pick key techniques• Prioritize testing foci
Design tests• Select key test ideas• Create tests for each idea
Design oracles• Mechanisms for determining
whether the program passed or failed a test
Assess the tests• Debug the tests• Polish their design• Evaluate any bugs found by them
Execute the tests• Troubleshoot failures• Report bugs• Identify broken tests
Document the tests• What test ideas or spec items does each
test cover?• What algorithms generated the tests? • What oracles are relevant?
Maintain the tests• Recreate broken tests• Redocument revised tests
Manage test environment• Set up test lab• Select / use hardware/software
configurations• Manage test tools
Keep archival records• What tests have we run• What collections / suites provide what
coverage
![Page 4: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/4.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
4
Regression testingThis is the most commonly discussed approach to automated testing:• Create a test case• Run it and inspect the output• If the program fails, report a bug and try again later• If the program passes the test, save the resulting
outputs• In future testing:
– Run the program – Compare the output to the saved results. – Report an exception whenever the current output and
the saved output don’t match.
![Page 5: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/5.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
5
Really? This is automation?• Analyze product & its risks -- Human• Develop testing strategy -- Human• Design tests -- Human• Design oracles -- Human• Run each test the first time -- Human• Assess the tests -- Human• Save the code -- Human• Save the results for comparison -- Human• Document the tests -- Human• (Re-)Execute the tests -- Computer• Evaluate the results -- Computer + Human• Maintain the tests -- Human• Manage test environment -- Human• Keep archival records -- Human
![Page 6: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/6.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
6
This is computer-assisted testing, not automated testing.
ALL testing is computer-assisted.
![Page 7: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/7.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
Other computer-assistance…• Tools to help create tests• Tools to sort, summarize or evaluate test output or test
results• Tools (simulators) to help us predict results• Tools to build models (e.g. state models) of the software,
from which we can build tests and evaluate / interpret results
• Tools to vary inputs, generating a large number of similar (but not the same) tests on the same theme, at minimal cost for the variation
• Tools to capture test output in ways that make test result replication easier
• Tools to expose the API to the non-programmer subject matter expert, improving the maintainability of SME-designed tests
• Support tools for parafunctional tests (usability, performance, etc.)
![Page 8: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/8.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
8
Don't think "automated or not"• Think continuum: more to
less
Not, "can we automate"• Instead: "can we automate
more?"
![Page 9: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/9.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
A hypothetical• System conversion (e.g. Filemaker application to SQL)
– Database application, 100 types of transactions, extensively specified (we know the fields involved in each transaction, know their characteristics via data dictionary)
– 15000 regression tests– Should we assess the new system by making it pass
the 15000 regression tests?– Maybe to start, but what about…
° Create a test generator to create high volumes of data combinations for each transaction. THEN:
° Randomize the order of transactions to check for interactions that lead to intermittent failures
– This lets us learn things we don’t know, and ask / answer questions we don’t know how to study in other ways
![Page 10: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/10.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
Suppose you decided to never
run another regression test. What kind of
automation could you do?
10
![Page 11: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/11.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
Fuzzing Sampling system
Long-Sequence Regression
Oracles
Model Reference Diagnostic Constraint
Inputs
• Input filters
• Function
• Consequences
• Output filters
Combinations
Task sequences
File contents
• Input / reference / config
State transitions
Execution environment
11
![Page 12: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/12.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
Issues that Drive Design of Test Automation • Theory of errorWhat kinds of errors do we hope to expose?
• Input dataHow will we select and generate input data and conditions?
• Sequential dependenceShould tests be independent? If not, what info should persist or drive sequence from test N to N+1?
• ExecutionHow well are test suites run, especially in case of individual test failures?
• Output dataObserve which outputs, and what dimensions of them?
• Comparison dataIF detection is via comparison to oracle data, where do we get the data?
• DetectionWhat heuristics/rules tell us there might be a problem?
• EvaluationHow to decide whether X is a problem or not?
• Troubleshooting supportFailure triggers what further data collection?
• NotificationHow/when is failure reported?
• RetentionIn general, what data do we keep?
• MaintenanceHow are tests / suites updated / replaced?
• Relevant contextsUnder what circumstances is this approach relevant/desirable?
![Page 13: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/13.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
Primary drivers of our designsThe primary driver of a design is the key factor that motivates us or makes the testing possible. In Doug's and my experience, the most common primary drivers have been:• Theory of error
– We’re hunting a class of bug that we have no better way to find
• Available oracle– We have an opportunity to verify or validate a
behavior with a tool• Ability to drive long sequences
– We can execute a lot of these tests cheaply.
13
![Page 14: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/14.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
More on … Theory of Error• Computational errors• Communications problems
– protocol error– their-fault interoperability failure
• Resource unavailability or corruption, driven by– history of operations– competition for the resource
• Race conditions or other time-related or thread-related errors
• Failure caused by toxic data value combinations– that span a large portion or a small portion of the
data space– that are likely or unlikely to be visible in "obvious"
tests based on customer usage or common heuristics14
![Page 15: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/15.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
Simulate Events with Diagnostic Probes• 1984. First phone on the market with
an LCD display. • One of the first PBX's with integrated
voice and data. • 108 voice features, 110 data
features.Simulate traffic on system, with• Settable
probabilities of state transitions
• Diagnostic reporting whenever a suspicious event detected
![Page 16: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/16.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
16
More on … Available OracleTypical oracles used in test automation• Reference program• Model that predicts results• Embedded or self-verifying data• Checks for known constraints• Diagnostics
![Page 17: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/17.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
17
Function Equivalence Testing• MASPAR (the Massively Parallel computer, 64K
parallel processors). • The MASPAR computer has several built-in
mathematical functions. We’re going to consider the Integer square root.
• This function takes a 32-bit word as an input. Any bit pattern in that word can be interpreted as an integer whose value is between 0 and 232-1. There are 4,294,967,296 possible inputs to this function.
• Tested against a reference implementation of square root
![Page 18: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/18.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
Function Equivalence Test• The 32-bit tests took the computer only 6 minutes to
run the tests and compare the results to an oracle. • There were 2 (two) errors, neither of them near any
boundary. (The underlying error was that a bit was sometimes mis-set, but in most error cases, there was no effect on the final calculated result.) Without an exhaustive test, these errors probably wouldn’t have shown up.
• For 64-bit integer square root, function equivalence tests involved random sample rather than exhaustive testing because the full set would have required 6 minutes x 232 tests.
18
![Page 19: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/19.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
This tests for equivalence of functions, but it is less exhaustive than it looks
(Acknowledgement: From Doug Hoffman)
Program state
System state
Configuration and system resourcesCooperating processes, clients or servers
System state
Impacts on connected devices / resourcesTo cooperating processes, clients or servers
Program state, (and uninspected outputs)
System under
test
Reference function
Monitored outputsIntended
inputs
Program state
System state
Configuration and system resourcesCooperating processes, clients or servers
Program state, (and uninspected outputs)
System state
Impacts on connected devices / resourcesTo cooperating processes, clients or servers
Intended inputs
Monitored outputs
![Page 20: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/20.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
20
More on … Ability to Drive Long SequencesAny execution engine will (potentially) do:• Commercial regression-test execution tools• Customized tools for driving programs with (for
example)– Messages (to be sent to other systems or
subsystems)– Inputs that will cause state transitions– Inputs for evaluation (e.g. inputs to functions)
![Page 21: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/21.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
Long-sequence regression• Tests taken from the pool of tests the program
has passed in this build.• The tests sampled are run in random order until
the software under test fails (e.g crash).• Typical defects found include timing problems,
memory corruption (including stack corruption), and memory leaks.
• Recent (2004) release: 293 reported failures exposed 74 distinct bugs, including 14 showstoppers.
Note:• these tests are no longer testing for the failures they were designed
to expose.• these tests add nothing to typical measures of coverage, because
the statements, branches and subpaths within these tests were covered the first time these tests were run in this build.
![Page 22: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/22.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
Imagining a structure
for high-volume automated testing
22
![Page 23: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/23.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
23
Some common characteristics• The tester codes a testing process rather than
individual tests.• Following the tester’s algorithms, the computer creates
tests (maybe millions of tests), runs them, evaluates their results, reports suspicious results (possible failures), and reports a summary of its testing session.
• The tests often expose bugs that we don’t know how to design focused tests to look for. – They expose memory leaks, wild pointers, stack
corruption, timing errors and many other problems that are not anticipated in the specification, but are clearly inappropriate (i.e. bugs).
– Traditional expected results (the expected result of 2+3 is 5) are often irrelevant.
![Page 24: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/24.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
What can we vary?• Inputs to functions
– To check input filters– To check operation of the
function– To check consequences
(what the other parts of the program do with the results of the function)
– To drive the program's outputs
• Combinations of data• Sequences of tasks
• Contents of files– Input files– Reference files– Configuration files
• State transitions– Sequences in a state
model– Sequences that drive
toward a result• Execution environment
– Background activity– Competition for specific
resources• Message streams
24
![Page 25: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/25.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
How can we vary them?Fuzzing:• Random generation /
selection of tests• Execution engine• Weak oracle (run till
crash)Fuzzing examples• Random inputs• Random state
transitions (dumb monkey)
• File contents• Message streams• Grammars
Statistical or AI sampling
• Test selection optimized against some criteria
Long-sequence regression
Model-based oracle• E.g. state machine• E.g. mathematical
modelReference programDiagnostic oracleConstraint oracle
25
![Page 26: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/26.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
Fuzzing
Sampling
system
Long-Seque
nce Regres
sion
Oracles
Model
Reference
Diagnostic
Constraint
Inputs• Input
filters• Function• Consequen
ces• Output
filtersCombinationsTask sequencesFile contents• Input /
reference / config
State transitionsExecution environment
26
![Page 27: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/27.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
Issues that Drive Design of Test Automation • Theory of errorWhat kinds of errors do we hope to expose?
• Input dataHow will we select and generate input data and conditions?
• Sequential dependenceShould tests be independent? If not, what info should persist or drive sequence from test N to N+1?
• ExecutionHow well are test suites run, especially in case of individual test failures?
• Output dataObserve which outputs, and what dimensions of them?
• Comparison dataIF detection is via comparison to oracle data, where do we get the data?
• DetectionWhat heuristics/rules tell us there might be a problem?
• EvaluationHow to decide whether X is a problem or not?
• Troubleshooting supportFailure triggers what further data collection?
• NotificationHow/when is failure reported?
• RetentionIn general, what data do we keep?
• MaintenanceHow are tests / suites updated / replaced?
• Relevant contextsUnder what circumstances is this approach relevant/desirable?
![Page 28: Acknowledgments:](https://reader036.fdocuments.us/reader036/viewer/2022062501/56816371550346895dd44ccb/html5/thumbnails/28.jpg)
Colloquium: Florida Tech Copyright © 2012 Cem Kaner
28
About Cem Kaner• Professor of Software Engineering, Florida Tech
I’ve worked in all areas of product development (programmer, tester, writer, teacher, user interface designer, software salesperson, organization development consultant, as a manager of user documentation, software testing, and software development, and as an attorney focusing on the law of software quality.) Senior author of three books:• Lessons Learned in Software Testing (with James Bach &
Bret Pettichord)• Bad Software (with David Pels)• Testing Computer Software (with Jack Falk & Hung Quoc
Nguyen).My doctoral research on psychophysics (perceptual measurement) nurtured my interests in human factors (usable computer systems) and measurement theory.