Automated Developer Testing: Achievements and Challenges

Automated Developer Testing: Achievements and Challenges

Tao XieNorth Carolina State University

contact: [email protected]

Automation in Developer Testing• Background on developer testing– http://www.developertesting.com/ – Kent Beck’s 2004 talk on “Future of Developer Testing”

http://www.itconversations.com/shows/detail301.html

• This talk focuses on developer testing– Not system testing etc. conducted by testers

• Unit Test Automation commonly referred to writing unit test cases manually, executed automatically

• Automation here is broad, including automatic test generation 2

http://www.developertesting.com/

http://www.itconversations.com/shows/detail301.html

Software Testing Setup

=?Outputs Expected Outputs

Program

+

Test inputs

Test Oracles

3

Software Testing Problems


Program

+

Test inputs

Test Oracles

4

• Faster: How can tools help developers create and run tests faster?



Program

+

Test inputs

Test Oracles

5


• Better Test Inputs: How can tools help generate new better test inputs?



Program

+

Test inputs

Test Oracles

6


• Better Test Inputs: How can tools help generate new better test inputs?

• Better Test Oracles: How can tools help generate better test oracles?

Example Unit Test Case


Program

+

Test inputs

Test Oracles

7

void addTest() { ArrayList a = new ArrayList(1); Object o = new Object(); a.add(o); AssertTrue(a.get(0) == o); }

• Appropriate method sequence • Appropriate primitive argument values• Appropriate assertions

Test Case = Test Input + Test Oracle

Levels of Test Oracles• Expected output for an individual test input

– In the form of assertions in test code• Properties applicable for multiple test inputs

– Crash (uncaught exceptions) or not, related to robustness issues, supported by most tools

– Properties in production code: Design by Contract (precondition, postcondition, class invariants) supported by Parasoft Jtest, Google CodePro AnalytiX

– Properties in test code: Parameterized unit tests supported by MSR Pex, AgitarOne

X. Xiao, S. Thummalapenta, and T. Xie. Advances on Improving Automation in Developer Testing. In Advances in Computers, 2012 http://people.engr.ncsu.edu/txie/publications.htm#ac12-devtest

http://people.engr.ncsu.edu/txie/publications.htm

Economics of Test Oracles

9

• Expected output for an individual test input– Easy to manually verify for one test input– Expensive/infeasible to verify for many test inputs– Limited benefits: only for one test input

• Properties applicable for multiple test inputs– Not easy to write (need abstraction skills)– But once written, broad benefits for multiple test

inputs

Assert behavior of multiple test inputs Design by Contract

• Example tools: Parasoft Jtest, Google CodePro AnalytiX, MSR Code Contracts, MSR Pex

• Class invariant: properties being satisfied by an object (in a consistent state) [AgitarOne allows a class invariant helper method used as test oracles]

• Precondition: conditions to be satisfied (on receiver object and arguments) before a method can be invoked

• Postcondition: properties being satisfied (on receiver object and return) after the method has returned

• Other types of specs also exist

http://research.microsoft.com/en-us/projects/contracts/


Microsoft Research Code Contracts

[ContractInvariantMethod]void ObjectInvariant() { Contract.Invariant( items != null );}

Features Language expression syntax

Type checking / IDE Declarative Special Encodings

Result and Old

public virtual int Add(object value){ Contract.Requires( value != null ); Contract.Ensures( Count == Contract.OldValue(Count) + 1 ); Contract.Ensures( Contract.Result<int>() == Contract.OldValue(Count) ); if (count == items.Length) EnsureCapacity(count + 1); items[count] = value; return count++;}

- Slide adapted from MSR RiSEhttp://research.microsoft.com/en-us/projects/contracts/


Parameterized Unit Testing

void TestAdd(List list, int item) { Assume.IsTrue(list != null); var count = list.Count; list.Add(item); Assert.AreEqual(count + 1, list.Count);}

• Parameterized Unit Test = Unit Test with Parameters

• Separation of concerns– Data is generated by a tool– Developer can focus on functional specification

[Tillmann&Schulte ESEC/FSE 05]

http://research.microsoft.com/apps/pubs/default.aspx?id=77419

http://research.microsoft.com/apps/pubs/default.aspx?id=77419

Parameterized Unit Tests are Formal Specifications

Algebraic Specifications• A Parameterized Unit Test can be read as a

universally quantified, conditional axiom.void TestReadWrite(Res r, string name, string data) { Assume.IsTrue(r!=null & name!=null && data!=null); r.WriteResource(name, data); Assert.AreEqual(r.ReadResource(name), data);} string name, string data, Res r: r ≠ null name ≠ null data ≠ null ⋀ ⋀ ⇒ equals( ReadResource(WriteResource(r, name, data).state, name), data)

http://research.microsoft.com/pex/

Parameterized Unit Tests in Pex

Parameterized Unit TestingGetting PopularParameterized Unit Tests (PUTs) commonly supported by

various test frameworks• .NET: Supported by .NET test frameworks

– http://www.mbunit.com/– http://www.nunit.org/– …

• Java: Supported by JUnit 4.X– http://www.junit.org/

Generating test inputs for PUTs supported by tools• .NET: Supported by Microsoft Research Pex– http://research.microsoft.com/Pex/

• Java: Supported by Agitar AgitarOne– http://www.agitar.com/

Parameterized Test-Driven Development

Write/refine Contract as PUT

Write/refine Code of Implementation

Fix-it (with Pex),Debug with generated tests

Use Generated Tests for Regression

Run Pex

Bug in PUT

Bug in Code

failures

no failures

Assert behavior of multiple test inputs Software Agitation in AgitarOne

Code

SoftwareAgitation

Observationson code behavior,

plusTest Coverage data

If an Observationreveals a bug, fix it

If it describes desired behavior, click to create a Test AssertionCode

Compile

Review

Agitate

- Slide adapted from Agitar Software Inc.http://www.agitar.com/

Software Agitation in AgitarOne

18 Image from http://www.agitar.com/

http://www.agitar.com/

Automated Test Generation

19

Recent advanced technique: Dynamic Symbolic Execution/Concolic Testing Instrument code to explore feasible paths

Example tool: Pex from Microsoft Research (for .NET programs)

P. Godefroid, N. Klarlund, and K. Sen. DART: directed automated random testing. In Proc. PLDI 2005K. Sen, D. Marinov, and G. Agha. CUTE: a concolic unit testing engine for C. In Proc. ESEC/FSE 2005N. Tillmann and J. de Halleux. Pex - White Box Test Generation for .NET. In Proc. TAP 2008

void CoverMe(int[] a){ if (a == null) return; if (a.Length > 0) if (a[0] == 1234567890) throw new Exception("bug");}

a.Length>0

a[0]==123…

TF

T

F

Fa==null

T

Constraints to solve

a!=null a!=null &&a.Length>0

a!=null &&a.Length>0 &&a[0]==123456890

Input

null{}

{0}

{123…}

Execute&MonitorSolveChoose next path

Observed constraints

a==nulla!=null &&!(a.Length>0)a==null &&a.Length>0 &&a[0]!=1234567890a==null &&a.Length>0 &&a[0]==1234567890

Done: There is no path left.

Dynamic Symbolic Execution in Pex

http://pex4fun.com/HowDoesPexWork



Automating Test Generation

• Method sequences – MSeqGen/Seeker [Thummalapenta et al. OOSPLA 11, ESEC/FSE 09],

Covana [Xiao et al. ICSE 2011], OCAT [Jaygarl et al. ISSTA 10], Evacon [Inkumsah et al. ASE 08], Symclat [d'Amorim et al. ASE 06]

• Environments e.g., db, file systems, network, …– DBApp Testing [Taneja et al. ESEC/FSE 11], [Pan et al. ASE 11]– CloudApp Testing [Zhang et al. IEEE Soft 12]

• Loops – Fitnex [Xie et al. DSN 09]

@NCSU ASE



Pex on MSDN DevLabsIncubation Project for Visual Studio

Download counts (20 months)(Feb. 2008 - Oct. 2009 )

Academic: 17,366 Devlabs: 13,022 Total: 30,388

http://research.microsoft.com/projects/pex/

Open Source Pex extensionshttp://pexase.codeplex.com/

Publications: http://research.microsoft.com/en-us/projects/pex/community.aspx#publications

http://research.microsoft.com/en-us/projects/pex/community.aspx

Writing Test Oracles Learning Formal Methods!?

• Parameterized Unit Test = Unit Test with Parameters

• Separation of concerns– Data is generated by a tool– Developer can focus on functional specification

void TestAdd(List list, int item) { Assume.IsTrue(list != null); var count = list.Count; list.Add(item); Assert.AreEqual(count + 1, list.Count);}

Automatic Test Generation Human Assistance to Test Generation?!

Running Symbolic PathFinder ...…=============================

========================= results

no errors detected=============================

========================= statistics

elapsed time: 0:00:02states: new=4, visited=0,

backtracked=4, end=2search: maxDepth=3, constraints=0choice generators: thread=1, data=2heap: gc=3, new=271, free=22instructions: 2875max memory: 81MBloaded code: classes=71, methods=884

…

25

Challenges Faced by Test Generation Tools

object-creation problems (OCP) - 65% external-method call problems (EMCP) – 27%

Total block coverage achieved is 50%, lowest coverage 16%.

26

Example: Dynamic Symbolic Execution/Concolic Testing Instrument code to explore feasible paths Challenge: path explosion

A graph example from QuickGraph library

Includes two classes GraphDFSAlgorithm

GraphAddVertexAddEdge: requires

both vertices to be in graph

00: class Graph : IVEListGraph { …03: public void AddVertex (IVertex v) {04: vertices.Add(v); // B1 }06: public Edge AddEdge (IVertex v1, IVertex v2) {07: if (!vertices.Contains(v1))08: throw new VNotFoundException(""); 09: // B210: if (!vertices.Contains(v2))11: throw new VNotFoundException("");12: // B314: Edge e = new Edge(v1, v2);15: edges.Add(e); } }

//DFS:DepthFirstSearch18: class DFSAlgorithm { … 23: public void Compute (IVertex s) { ...24: if (graph.GetEdges().Size() > 0) { // B425: isComputed = true;26: foreach (Edge e in graph.GetEdges()) {27: ... // B528: }29: } } } [Thummalapenta et al. OOPSLA 11]

Example Object-Creation Problem

28

Test target: Cover true branch (B4) of Line 24

Desired object state: graph should include at least one edge

Target sequence:

Graph ag = new Graph();Vertex v1 = new Vertex(0);Vertex v2 = new Vertex(1);ag.AddVertex(v1);ag.AddVertex(v2);ag.AddEdge(v1, v2);DFSAlgorithm algo = new

DFSAlgorithm(ag);algo.Compute(v1);

00: class Graph : IVEListGraph { …03: public void AddVertex (IVertex v) {04: vertices.Add(v); // B1 }06: public Edge AddEdge (IVertex v1, IVertex v2) {07: if (!vertices.Contains(v1))08: throw new VNotFoundException(""); 09: // B210: if (!vertices.Contains(v2))11: throw new VNotFoundException("");12: // B314: Edge e = new Edge(v1, v2);15: edges.Add(e); } }

//DFS:DepthFirstSearch18: class DFSAlgorithm { … 23: public void Compute (IVertex s) { ...24: if (graph.GetEdges().Size() > 0) { // B425: isComputed = true;26: foreach (Edge e in graph.GetEdges()) {27: ... // B528: }29: } } }

Example Object-Creation Problem

[Thummalapenta et al. OOPSLA 11]

Challenges Faced by Test Generation Tools

object-creation problems (OCP) - 65% external-method call problems (EMCP) – 27%

Total block coverage achieved is 50%, lowest coverage 16%.

29

Example: Dynamic Symbolic Execution/Concolic (Pex) Instrument code to explore feasible paths Challenge: path explosion

Example External-Method Call Problems (EMCP)

Example 1: File.Exists has data dependencies

on program input Subsequent branch at Line 1 using

the return value of File.Exists.

Example 2: Path.GetFullPath has data

dependencies on program input Path.GetFullPath throws

exceptions.

Example 3: String.Format do not cause any problem

30

1

2

3

Human Can Help! Object Creation Problems (OCP)

Tackle object-creation problems with Factory Methods

31

Human Can Help!External-Method Call Problems (EMCP)

Tackle external-method call problems with Mock Methods or Method Instrumentation

Mocking System.IO.File.ReadAllText

32

State-of-the-Art/Practice Testing Tools

Running Symbolic PathFinder ...…=============================

========================= results

no errors detected=============================

========================= statistics

elapsed time: 0:00:02states: new=4, visited=0,

backtracked=4, end=2search: maxDepth=3, constraints=0choice generators: thread=1, data=2heap: gc=3, new=271, free=22instructions: 2875max memory: 81MBloaded code: classes=71, methods=884

…

Tools typically don’t communicate challenges faced by them to enable cooperation between tools and users.

We typically don’t teach people how to cooperate with tools.

33

X. Xiao, T. Xie, N. Tillmann, and J. de Halleux. Precise Identification of Problems for Structural Test Generation. In Proc. ICSE 2011http://people.engr.ncsu.edu/txie/publications/icse11-covana.pdf

http://people.engr.ncsu.edu/txie/publications/icse11-covana.pdf

Coding Duels

1,206,095 clicked 'Ask Pex!'

Coding Duels

Pex computes “semantic diff” in cloudcode written in browser vs.secret reference implementationYou win when Pex finds no differences

secret

Behind the Scene of Pex for Fun

Secret Implementation

class Secret { public static int Puzzle(int x) { if (x <= 0) return 1; return x * Puzzle(x-1); }}

Player Implementation

class Player { public static int Puzzle(int x) { return x; }}

class Test {public static void Driver(int x) { if (Secret.Puzzle(x) != Player.Puzzle(x)) throw new Exception(“Mismatch”); }}

behaviorSecret Impl == Player Impl

36

Coding DuelsFun and Engaging

Iterative gameplayAdaptivePersonalizedNo cheatingClear winning criterion

Example User Feedback

“It really got me *excited*. The part that got me most is about spreading interest in teaching CS: I do think that it’s REALLY great for teaching | learning!”

“I used to love the first person shooters and the satisfaction of blowing away a whole team of Noobies playing Rainbow Six, but this is far more fun.”

“I’m afraid I’ll have to constrain myself to spend just an hour or so a day on this really exciting stuff, as I’m really stuffed with work.”

Released since 2010

X

Coding Duel Competition @ICSE 2011

http://pexforfun.com/icse2011

http://pexforfun.com/icse2011

Teaching and Learning

Coding Duels for Automatic Grading @Grad Software Engineering Course

http://pexforfun.com/gradsofteng

http://pexforfun.com/gradsofteng

Coding Duels for Training Testingpublic static string Puzzle(int[] elems, int capacity, int elem) { if ((maxsize <= 0) || (elems == null) || (elems.Length > (capacity + 1))) return "Assumption Violation!"; Stack s= new Stack(capacity); for (int i = 0; i < elems.Length; i++) s.Push(elems[i]); int origSize = s.GetNumOfElements(); //Please fill in below test scenario on the s stack

//The lines below include assertions to assert the program behavior PexAssert.IsTrue(s.GetNumOfElements() == origSize + 1); PexAssert.IsTrue(s.Top() == elem); PexAssert.IsTrue(!s.IsEmpty()); PexAssert.IsTrue(s.IsMember(elem)); return s.GetNumOfElements().ToString() + "; “ + s.Top().ToString() + "; “ + s.IsMember(elem).ToString() + "; " + s.IsEmpty(); }

Set up a stack with some elements

Cache values used in assertions

Usage Scenarios of Pex4Fun• Massive Open Online Courses (MOOC): Challenges– Grading, addressed by Pex4Fun– Cheating [Open Challenge]

• Course assignments (students/professionals)– E.g., intro programming, software engineering

• Student/professional competitions– E.g., coding-duel competition at ICSE 2011

• Assessment of testing/programming/problem solving skills for job applicants– Not just final results of problem solving but also process!

More ReadingNikolai Tillmann, Jonathan De Halleux, Tao Xie, Sumit Gulwani and Judith BishopTeaching and Learning Programming and Software Engineering via Interactive GamingIn Proceedings of the 35th International Conference on Software Engineering (ICSE 2013), Software Engineering Education (SEE), San Francisco, CA, May 2013.http://people.engr.ncsu.edu/txie/publications/icse13see-pex4fun.pdf

http://people.engr.ncsu.edu/txie/publications/icse13see-pex4fun.pdf

http://people.engr.ncsu.edu/txie/publications/icse13see-pex4fun.pdf

Conclusion

• Software testing is important and yet costly; needs automation

• Better Test Inputs: help generate new better test inputs– Generate method arguments– Generate method sequences

• Better Test Oracles: help generate better test oracles– Assert behavior of individual test inputs– Assert behavior of multiple test inputs

• Software Testing Educational Gaming– http://www.pexforfun.com/

45

Example Industrial Developer Testing Tools

• Agitar AgitatorOne http://www.agitar.com/ • Parasoft Jtest http://www.parasoft.com/ • Google CodePro AnalytiX

https://developers.google.com/java-dev-tools/codepro/doc/ • SilverMark Test Mentor http://www.silvermark.com/

• Microsoft Research Pex (for .NET) http://research.microsoft.com/Pex/

• Microsoft Research Spec Explorer (for .NET) http://research.microsoft.com/specexplorer/

46

http://www.agitar.com/

http://www.parasoft.com/

https://developers.google.com/java-dev-tools/codepro/doc/

http://www.silvermark.com/

http://research.microsoft.com/Pex/

http://research.microsoft.com/specexplorer/

Trends in Practice• Regression Test Selection/Prioritization• Cloud Computing for Test Execution, e.g.,

http://www.skytap.com/• Crowdsourcing for Testing, e.g.,

http://www.utest.com/• Mocking Environments– Google: EasyMock– Microsoft VS: Fake/Moles

http://research.microsoft.com/en-us/projects/pex/

• Automatic Test Generation– Microsoft: Pex, SAGE

http://research.microsoft.com/en-us/um/people/pg/

Q & AThank you!

contact: [email protected]

Acknowledgments: NSF grants CCF-0845272, CCF-0915400, CNS-0958235, CNS-1160603, a Microsoft Research SEIF Award, and a Microsoft Research Award.

Automated Combinatorial Testing

Goals – reduce testing cost, improve cost-benefit ratio

Accomplishments – huge increase in performance, scalability, 200+ users, most major IT firms and others Also non-testing applications – modelling and simulation, genome

http://csrc.nist.gov/groups/SNS/acts/index.html

http://csrc.nist.gov/groups/SNS/acts/index.html

Failure-triggering Interactions• Additional studies consistent• > 4,000 failure reports analyzed• Conclusion: failures triggered by few variables

NIST ACTS Tool• Covering array generator• Coverage analysis - what is the combinatorial coverage of

existing test set?• .NET configuration file generator• Fault characterization -

ongoing Current users

http://csrc.nist.gov/groups/SNS/acts/documents/comparison-report.html

approximately 200 users as of July 2009, in IT, defense, finance, telecom, and many other industries

http://csrc.nist.gov/groups/SNS/acts/documents/comparison-report.html

Defining a New System

Variable Interaction Strength

Constraints

Covering Array Output

Automated Developer Testing: Achievements and Challenges

Documents

Transcript of Automated Developer Testing: Achievements and Challenges