The Human Competitiveness of Search Based Software Engineering

The Human Competitiveness of Search Based Software Engineering

Optimization in Software Engineering Group (GOES.UECE)State University of Ceará, Brazil

Jerffeson Teixeira de SouzaCamila Loiola MaiaFabrício Gomes de FreitasDaniel Pinto Coutinho

2nd International Symposium on Search Based Software EngineeringSeptember 7 – 9, 2010

Benevento, Italy

Jerffeson Teixeira de Souza, Ph.D.State University of Ceará, Brazil Professor

http://goes.comp.uece.br/[email protected]

Nice to meet you,

Our little time will be divided as follows

Part 01 Research QuestionsPart 02 Experimental Design Part 03 Results and Analises Part 04 Final Considerations

The question regarding the human competitiveness of SBSE ...

has already been raisedno comprehensive work has been

published to date.

why ?

Mark Harman, The Current State and Future of Search Based Software Engineering, Proceedings of International Conference on Software Engineering / Future of Software Engineering 2007 (ICSE/FOSE '07), Minneapolis: IEEE Computer Society, pp. 342-357, 2007.

one may argue ...

The human competitiveness of SBSE is not in doubt by the SBSE community

!

But, even if that is the case,...

Strong research results regarding this issue would likely, in the least, contribute to the

increasing acceptance of SBSE outside its research community

!

Can the results generated by Search Based Software Engineering be said to be

human competitive? SBSE human competitiveness?

but ...

How to evaluate the Human Competitiveness of SBSE?

The result holds its own or wins a regulated competition involving human contestants (in the form of either live human players or human-written computer programs).

“

”

SSBSEHUMANS

VS

MACHINETHRUSDAY, SEPTEMPER 9 – 11:30 CEST / 05:30 ET

LIVE ON PAY-PER-VIEWLIVE ON PAY-PER-VIEW

FROM THE SOLD-OUT MUSEUM OF SANNIO ARENA BENEVENTO, ITALY

2010

SSBSEHUMANS

VS

SBSE ALGORITHMSTHRUSDAY, SEPTEMPER 9 – 11:30 CEST / 05:30 ET



2010

SSBSEHUMANS

VS

SBSE ALGORITHMSTHRUSDAY, SEPTEMPER 9 – 11:30 CEST / 05:30 ET



2010

which which ones ones ??



How do different metaheuristics compare in solving a variety of search based software

engineering problems? SBSE algorithms comparison?

The Problems

The Next Release Problem

The Multi-Objective Next Release Problem

The Workgroup Formation Problem

The Multi-Objective Test Case Selection Problem

They can be considered “classical” formulations

They cover together a range of three different general phases in the software development life cycle

Motivation

This selection prioritizes customers with higher importance to the company

and must respect a pre-determined budget

Sw

ii

BRcostSi

i

*

Involves determining a set of customers which will have their selected requirements delivered

in the next software release

THE NEXT RELEASE PROBLEM

The cost of implementing the selected requirements is taken as an independent

objective to be optimized, not as a constraint,

along withi

n

ii xcost

1

THE MULTI-OBJECTIVE NEXT RELEASE PROBLEM

a score representing the importance of a

given requirementi

n

ii xscore

1

The formulation displays a single objective function to be minimized, which composes both

salary costs, skill and preference factors

THE WORKGROUP FORMATION PROBLEM

deals with the allocation of human resources to project tasks

P

pa

N

aapp DurASal

1 1

N

a

P

p

P

pppapappp

N

a

P

papmpa

N

a

P

pappa

P

p

S

s

N

asapps

XAAP

APAP

SIAR

1 11 12212121

1 11 1

1 1 1

The paper discusses two variations, one which considers two objectives (code coverage and execution time), used here, and the other covering three objectives (code coverage, execution time and fault detection).

THE MULTI-OBJECTIVE TEST CASE SELECTION PROBLEM

extends previously published mono-objective formulations

The Data

For each problem (NRP, MONRP, WORK and TEST), two instances, A and B, with increasing sizes, were synthetically generated.

Instance NameInstance Features

# Customers # TasksNRPA 10 20NRPB 20 40

INSTANCES FOR PROBLEM NRP


# Customers # Requirements

MONRPA 10 20MONRPB 20 40

INSTANCES FOR PROBLEM MONRP


# Persons # Skills # ActivitiesWORKA 10 5 5WORKB 20 10 10

INSTANCES FOR PROBLEM WORK


# Test Cases # Code Blocks

TESTA 20 40TESTB 40 80

INSTANCES FOR PROBLEM TEST

The Algorithms

For Mono-Objective ProblemsGenetic Algorithm (GA)Simulated Annealing (SA)

For Multi-Objective ProblemsNSGA-IIMoCell

For Mono and Multi-Objective Problems

Random Search

NUMBER OF HUMAN RESPONDENTS PER INSTANCE

Human Subjects

A total of 63 professional software engineers solved some or all of the instances.

Human SubjectsBesides solving the problem instance, each respondent wasasked to answer the following questions related to each problem instance

How hard was it to solve this problem instance?

How hard would it be for you to solve an instance twice this size?

What do you think the quality of a solution generated by you over an instance twice this size would be?

In addition to these specific questions regarding each problem instance, general questions on the respondent theoretical and

practical experience over software engineering

Comparison Metrics

For Mono-Objective ProblemsQuality

For Multi-Objective ProblemsHypervolumeSpreadNumber of Solutions

For Mono and Multi-Objective Problems

Execution Time

How do different metaheuristics compare in solving a variety of search based software

engineering problems? SBSE algorithms comparison?

RESULTS AND RESULTS AND

ANALYSESANALYSES

Problem GA SA RAND

NRPA 26.45±0.500 25.74±0.949 15.03±5.950

NRPB 95.41±0.190 90.47±7.023 45.74±11.819

WORKA 16,026.17±51.700 18,644.71±1,260.194 19,391.34±1,220.17

WORKB 24,831.23±388.107 35,174.19±2,464.733 36,892.64±2,428.269

RESULTS

Quality of Results for NRP and WORKaverages and standard deviations, over 100 executions

Problem GA SA RAND

NRPA 40.92±11.112 23.01±7.476 0.00±0.002

NRPB 504.72±95.665 292.62±55.548 0.06±0.016

WORKA 242.42±44.117 73.35±19.702 0.04±0.010

WORKB 4,797.89±645.360 2,211.28±234.256 1.75±0.158

RESULTS

Time (in miliseconds) Results for NRP and WORKaverages and standard deviations, over 100 executions

Boxplots showing average (+), maximum (), minimum () and 25% - 75% quartile ranges of quality for mono-objective problems NRP and

WORK, instances A and B, for GA, SA and Random Search.

NRPA NRPB

WORKA WORKB

RESULTS

Hypervolume Results for MONRP and TESTaverages and standard deviations, over 100 executions

Problem NSGA-II MOCell RAND

MONRPA 0.6519±0.009 0.6494±0.013 0.5479±0.0701

MONRPB 0.6488±0.015 0.6470±0.017 0.5462±0.0584

TESTA 0.5997±0.009 0.5867±0.019 0.5804±0.0648

TESTB 0.6608±0.020 0.6243±0.044 0.5673±0.1083

RESULTS

Spread Results for MONRP and TESTaverages and standard deviations, over 100 executions


MONRPA 0.4216±0.094 0.3973±0.031 0.5492±0.1058

MONRPB 0.4935±0.098 0.3630±0.032 0.5504±0.1081

TESTA 0.4330±0.076 0.2659±0.038 0.5060±0.1029

TESTB 0.3503±0.178 0.2963±0.072 0.4712±0.1410

RESULTS

Time (in miliseconds) Results for MONRP and TESTaverages and standard deviations, over 100 executions


MONRPA 1,420.48±168.858 993.09±117.227 25.30±10.132

MONRPB 1,756.71±138.505 1,529.32±141.778 30.49±7.204

TESTA 1,661.03±125.131 1,168.47±142.534 25.24±11.038

TESTB 1,693.37±138.895 1,370.96±127.953 32.89±9.335

RESULTS

Number of Solutions Results for MONRP and TESTaverages and standard deviations, over 100 executions


MONRPA 31.97±5.712 25.01±5.266 12.45±1.572

MONRPB 60.56±4.835 48.04±4.857 20.46±2.932

TESTA 35.43±4.110 26.20±5.971 12.54±1.282

TESTB 41.86±9.670 19.93±8.514 11.58±2.184

RESULTS

STANCES FOR PROBLEM NR

Example of the obtained solution sets for NSGA-II, MOCell and Random Search over problem MONRP, Instances A

0

20

40

60

80

100

120

-2400 -2200 -2000 -1800 -1600 -1400 -1200 -1000 -800 -600 -400

cost

value

MOCellNSGA-IIRandom

RESULTS


Example of the obtained solution sets for NSGA-II, MOCell and Random Search over problem MONRP, Instances B

40

60

80

100

120

140

160

180

200

220

-11000 -10000 -9000 -8000 -7000 -6000 -5000 -4000 -3000

cost

value

MOCellNSGA-IIRandom

RESULTS


Example of the obtained solution sets for NSGA-II, MOCell and Random Search over problem TEST, Instances A

0

200

400

600

800

1000

1200

1400

-100 -80 -60 -40 -20 0

cost

% coverage

MOCellNSGA-IIRandom

RESULTS


Example of the obtained solution sets for NSGA-II, MOCell and Random Search over problem TEST, Instances B

0

100

200

300

400

500

600

700

800

900

1000

-100 -90 -80 -70 -60 -50 -40

cost

% coverage

MOCellNSGA-IIRandom


human competitive? SBSE human competitiveness?RESULTS AND RESULTS AND

ANALYSESANALYSES

RESULTS

Quality and Time (in milliseconds) for NRP and WORKaverages and standard deviations

ProblemSBSE Humans

Quality Time Quality Time

NRPA 26.48±0.512

40.57±9.938

16.19±6.934

1,731,428.57±2,587,005.57

NRPB 95.77±0.832

534.69±91.133

77.85±23.459

3,084,000.00±2,542,943.10

WORKA 16,049.72±121.858

260.00±50.384

28,615.44±12,862.590

2,593,846.15±1,415,659.62

WORKB 25,047.40±322.085

4,919.30±1,219.912

50,604.60±20,378.740

5,280,000.00±3,400,588.14

Boxplots showing average (+), maximum (), minimum () and 25% - 75% quartile ranges of quality for mono-objective problems NRP and

WORK, instances A and B, for SBSE and Human Subjects.

NRPA NRPB

WORKA WORKB

RESULTS

Hypervolume and Time (in milliseconds) Results for SBSE and Humans For MONRP and TEST

averages and standard deviations

ProblemSBSE Humans

HV Time HV Time

MONRPA 0.6519±0.009

1,420.48±168.858 0.4448 1,365,000.00

±1,065,086.42

MONRPB 0.6488±0.015

1,756.71±138.505 0.2870 2,689,090.91

±2,046,662.91

TESTA 0.5997±0.009

1,661.03±125.131 0.4878 1,472,307.69

±892,171.07

TESTB 0.6608±0.020

1,693.37±138.895 0.4979 3,617,142.86

±3,819,431.52

RESULTS


Solutions generated by humans, and non-dominated solution sets produced by NSGA-II and MOCell for problem MONRP, instances A

0

20

40

60

80

100

120

-2500 -2000 -1500 -1000 -500 0

cost

value

MOCellNSGA-IIHumans

RESULTS


Solutions generated by humans, and non-dominated solution sets produced by NSGA-II and MOCell for problem MONRP, instances B

0

50

100

150

200

250

-11000-10000-9000 -8000 -7000 -6000 -5000 -4000 -3000 -2000 -1000

cost

value

MOCellNSGA-IIHumans

RESULTS


Solutions generated by humans, and non-dominated solution sets produced by NSGA-II and MOCell for problem TEST, instances A

0

50

100

150

200

250

-100 -80 -60 -40 -20 0

cost

% coverage

MOCellNSGA-IIHumans

RESULTS


Solutions generated by humans, and non-dominated solution sets produced by NSGA-II and MOCell for problem TEST, instances B

0

50

100

150

200

250

300

350

400

450

500

550

-100 -90 -80 -70 -60 -50 -40 -30 -20

cost

% coverage

MOCellNSGA-IIHumans

FURTHER HUMAN COMPETITIVENESS

ANALYSESHuman participants were asked to rate how difficult they

found each problem instance and how confident they were on their solutions

Bar chart showing percentage of human respondents who considered each problem “hard” or “very hard”


ANALYSESHuman participants were asked to rate how difficult they

found each problem instance and how confident they were on their solutions

Bar chart showing percentage of human respondents who were “confident” or “very confident”


ANALYSES

Bar charts showing percentage differences in quality for mono and multi-objective problems generated by SBSE and the human

subjects

NRP MONRP

WORK

TEST


ANALYSES

57.33% of the human participants which responded instance A indicated that solving instance B would be “harder” or “much

harder”, and 55.00% predicted that their solution for this instance would be “worse” or “much worse”

62.50% of the instance B respondents pointed out the increased difficulty of a problem instance twice larger, and

57.14% that their solution would be “worse” or “much worse”


ANALYSES

These results suggest that for larger problem instances, the potential of SBSE to generate even more accurate results, when

compared to humans, increases

In fact, this suggests that SBSE may be particularly useful insolving real-world large-scale software engineering

problems

Threats to Validity

Small instance sizes

Artificial instances

Number and diversity of human participants

Number of problems

This paper reports the results of an extensive experimental research aimed at evaluating the

human competitiveness of SBSE

FINALCONSIDERATIONS

Secondarily, several tests were performed over four classical SBSE problems in order to evaluate

the performance of well-known metaheuristics in solving both mono- and multi-objective problems

Regarding the comparison of algorithms

FINALCONSIDERATIONS

GA generated more accurate solutions for mono-objective problems than SA

NSGA-II consistently outperformed MOCell in terms of hypervolume and number of generated solutions

MOCell outperformed NSGA-II when considering spread and the execution time

All of these results are consistent with previously published research

Regarding the human competitiveness question

FINALCONSIDERATIONS

Experiments strongly suggest that the results generated by search based software

engineering can, indeed, be said to be human competitive

Results indicate that for real-world large-scale software engineering problem, the benefits from applying SBSE may be even greater

That is it!Thanks for your time and attention.

Optimization in Software Engineering Group (GOES.UECE)State University of Ceará, Brazil

http://goes.comp.uece.br/

2nd International Symposium on Search Based Software EngineeringSeptember 7 – 9, 2010

Benevento, Italy

[email protected]

The Human Competitiveness of Search Based Software

Engineering

The Human Competitiveness of Search Based Software Engineering

Documents

Transcript of The Human Competitiveness of Search Based Software Engineering