GPCE16: Automatic Non-functional Testing of Code Generators Families

Automatic Non-functional Testing of Code Generators Families

Mohamed BOUSSAA

OlivierBARAIS

GersonSUNYE

BenoitBAUDRY

2016 IEEE International Conference on Software Quality, Reliability & Security (QRS 2016)

August 1-3, 2016 - Vienna, Austria

INRIA Rennes, France

15th International Conference on Generative Programming: Concepts & Experiences (GPCE 2016) Amsterdam, Netherlands, October 31 – November 1, 2016

1

a1. Context

a2. Motivation

a3. Automatic Non-functional Testing of Code Generators Families

a4. Performance Evaluation

a5. Conclusion

Outline

2

Context

3

All tests are successfully passed but…

How about the non-functional properties (quality) of generated code ?

Code generators are used everywhereThey automatically transform high-level system specifications (Models, DSLs,

GUIs, etc.) into general-purpose languages (JAVA, C++, C#, etc.)Target diverse and heterogeneous software platorms

Context

4

• Testing issues:

- Defective code generators may generate poor-quality code

- Testing the non-functional properties is time-consuming

- Require examining different non-functional requirements

- Code generators are complex and difficult to understand (involve complex and hetergenous technologies)

Motivation

5

Non-functional testing of code generators: The traditional way• Analyze the non-functional properties of generated code using platform-

specific tools, profilers, etc.

Lack of tools for automatic non-functional testing of code generators

Automatic Non-functional Testing of

Code Generators Familieshttps://testingcodegenerators.wordpress.com

6

Contributions

7

We propose:

• A runtime monitoring infrastructure, based on system containers (Docker) as execution platforms, that allow code-generator developers to evaluate the non-functional properties of generated code

• A black-box testing approach to automatically check the potential inefficient code generators

Microservice-based infrastructure

8

Execute and monitor of the generated code using system containers

Different configurations, instances, images, machines, etc

Resource isolation and management

Less performance overhead

Provide a fine-grained understanding and analysis of compilers behavior

Automatic extraction of non-functional properties relative to resource usage

Approach Overview

9

Approach Overview

000

000Compile and execute the

generated code within a new container instance

Gather at runtime non-functional properties of running programs under test

Save information relative to resource consumptions within a times series database

Analysis of the performance and non-functional properties

of programs under test

1

2

3

4

Code Execution

RuntimeMonitoring

Time seriesDatabase

PerformanceAnalysis

10

Testing Infrastructure

ComponentUnder Test

Back-endDatabase

Component

Cgroup file systems

Running…

Monitoring records

Front-end:VisualizationComponent

Time-series database

HTTP Requests

CPU

Memory

…

11

8086:

MonitoringComponent

…Code

Generation + Compilation

Testing Method

12

Definition (Code generator family): We define a code generator family as a set of code generators that takes as input the same language/model and generate code for different target platforms (example: Haxe, ThingML, etc)

Differential Testing: Compare equivalent implementations of the same program written in different languages

Standard deviation (std_dev):Quantify the amount of variation among the execution traces in terms of memory usage and execution time

Testing Method

13

Test suites with Std_dev > threshold value are interpreted as code generator inconsistencies

Evaluationhttps://testingcodegenerators.wordpress.com/experimental-results/

14

Experimental SetupHaxe Libraries + Test suites

For monitoring:Google cAdvisor

For storage:InfluxDB

Execution time (S)

Programs under test:

Haxe Libraries

Code Generators under Test:

Haxe Compilers

Non-functional metrics

Memory usage (MBytes)

15

5 targets: C#, C++, JAVA, JS, PHP

Validation

16

• The comparison results of running each test suite across five target languages: the metric used is the standard deviation between execution times

• Standard deviations are mostly close to 0 - 8 interval.

• 8 data points where the std_dev was extreamly high

Validation

17

Test suites with the highest variation in terms of execution time (k=60)

We can identify a singular behavior of the PHP code regarding the exectution time

Validation

18

• The comparison results of running each test suite across five target languages: the metric used is the standard deviation between memory consumptions

• Standard deviations are mostly close to 0 - 150 interval.

• 6 data points where the std_dev was extreamly high

Validation

19

Test suites with the highest variation in terms of memory usage (k=400)

We can identify a singular behavior of the PHP code regarding the memory usage

Validation

20

For Color_TS4 in PHP:

• We observe the intensive use of « arrays »

• We replace « arrays » by « SplFixedArray »

=> Speedup x5=> Memory usage reduction x2

Conclusion

21

Conclusion

22

Approach for testing and monitoring the code generators families using a container-based infrastructure

Automatically extract information about the resource usage

The evaluation results show that we can find real issues in existing code generators (i.e., PHP)

Summary

Detect more code generator issues (e.g., CPU consumption)

Evaluate our approach:• On other code generator families• Compare to other state-of-the-art

approaches

Future directions

22

https://testingcodegenerators.wordpress.com 23

Questions?

Tool Support

24

Visualization

25

26

Code Generators Testing: ThingML

GPCE16: Automatic Non-functional Testing of Code Generators Families

Education

Transcript of GPCE16: Automatic Non-functional Testing of Code Generators Families