The Robust Optimization of Non-Linear Requirements Models

Post on 06-May-2015

578 views 2 download

description

Masters Defense, WVU Dept. CSEE. April 7, 2010.

Transcript of The Robust Optimization of Non-Linear Requirements Models

G R E G O R Y G A Y T H E S I S D E F E N S E

W E S T V I R G I N I A U N I V E R S I T Y G R E G @ G R E G G A Y . C O M

The Robust Optimization of Non-Linear Requirements Models

Consider a requirements model…

  Contains:   Various goals of a project.   Methods for reaching those goals.   Risks that prevent those goals.   Mitigations that remove risks.

  (but carry costs)

  A solution: balance between cost and attainment.   This is a non-linear optimization problem!

2

Understanding the Solution Space

  Open and pressing issue. [Harman ‘07]   Many SE problems are over-constrained.

  No right answer, so give partial solutions.

  Robustness of solutions is key – many algorithms give brittle results. [Harman ‘01]

  Important to present insight into the neighborhood.   What happens if I do B instead of A?

3

The Naïve Approach

  Naive approaches to understanding neighborhood:   Run N times and report (a) the solutions appearing in more

than N/2 cases, or (b) results with a 95% confidence interval.

  Both are flawed – they require multiple trials!

  Neighborhood assessment must be fast [Feather ’08]   Real-time if possible.   [Nielson ‘93] states that results must be within:

 1 Second before the mind begins to drift.  10 seconds before the mind has completely moved on.

4

Research Goals

  Two important concerns:

  Is demonstrating solution robustness a time-consuming task?

  Must solution quality be traded against solution robustness?

5

What We Want 6

  These are the features we want out of an algorithm:

*(We want “more for less”)

Feature Algorithm

Neighborhood Assessment

High-Quality Results* Yes

Low Result Variance (Tame)

Yes

Scores Plateau to Stability (Well-behaved)

Yes

Speed Real-Time Results Yes

Scalability Yes

Why Do We Care? 7

  The later in the development process that a defect is identified, the more exponentially expensive it becomes to fix it

(Paraphrased from Barry Boehm [Boehm ‘81])

Roadmap 8

  Model   Algorithms   Experiments   Future Work   Conclusions

The Defect Detection and Prevention Model

  Used at NASA JPL by Martin Feather’s “Team X”   [Cornford ’01, Feather ‘02, Feather ’08, Jalali ‘08]

  Early-lifecycle requirements model   Light-weight ontology represents:

  Requirements: project objectives, weighted by importance.   Risks: events that damage attainment of requirements.   Mitigations: precautions that remove risk, carry a cost value.   Mappings: Directed, weighted edges between requirements

and risks and between risks and mitigations.   Part-of-relations: Provide structure between model

components.

9

Light-weight != Trivial 10

Why Use DDP?

  Three Reasons:

  1. Demonstrably useful. [Feather ‘02, Feather ‘08]  Cost savings often over $100,000  Numerous design improvements seen in DDP sessions  Overall shift in risks in JPL projects.

  2. Availability of real-world models [Jalali ‘08]  Now and in the future.

11

The Third Reason 12

  DDP is representative of other requirements tools.   Set of influences, expressed in a hierarchy, with relationships

modeled through equations.

  Sensitivity analysis [Cruz ‘73] not applicable to DDP.

Soft-Goals [Myloupolus ‘99]

QOC [MacLean ‘96]

Using DDP

  Input = Set of enabled mitigations.   Output = Two values: (Cost, Attainment)

  Those values are normalized and combined into a single score [Jalali ‘08]:

13

Roadmap 14

  Model   Algorithms   Experiments   Future Work   Conclusions

Search-Based Software Engineering 15

  Four factors must be met: [Harman ’01, Harman ‘04]   1. A large search space.   2. Low computational complexity.   3. Approximate continuity (in the score space).   4. No known optimal solutions.

  DDP Problem fits all:   1. Some models have up to (299 = 6.33*1029) possible settings.   2. Calculating the score is fast, algorithms run in O(N2)   3. Discrete variables, but continuous score space.   4. Solutions depend on project settings, optimal not known.

No single solution, so reformulate as a search problem and find several!

Theory of KEYS

  Theory: A minority of variables control the majority of the search space. [Menzies ‘07]

  If so, then a search that (a) finds those keys and (b) explores their ranges will rapidly plateau to stable, optimal solutions.

  This is not new: narrows, master-variables, back doors, and feature subset selection all work on the same theory.   [Amarel ’86, Crawford ’94, Kohavi ’97, Menzies ’03, Williams ’03]

  Everyone reports them, but few exploit them!

16

KEYS Algorithm

  Two components: greedy search and a Bayesian ranking method (BORE = “Best or Rest”).

  Each round, a greedy search:   Generate 100 configurations of mitigations 1…M.   Score them.   Sort top 10% of scores into “Best” grouping, bottom 90% into

“Rest.”   Rank individual mitigations using BORE.   The top ranking mitigation is fixed for all subsequent rounds.

  Stop when every mitigation has a value, return final cost and attainment values.

17

BORE Ranking Heuristic

  We don’t have to actually search for the keys, just keep frequency counts for “best” and “rest” scores.

  BORE [Clark ‘05] based on Bayes’ theorem. Use those frequency counts to calculate:

  To avoid low-frequency evidence, add support term:

18

KEYS vs KEYS2

  KEYS fixes a single top-ranked mitigation each round.

  KEYS2 [Gay ‘10] incrementally sets more (1 in round 1, 2 in round 2… M in round M)   Slightly less tame, much

faster.

19

Discovering KEYS2 20

  KEYS2 is simple, developing it was not!   Many different heuristics tried:

  Simple: Complex:  Set more in powers of 2 Shift best/rest slope   ln(round) Neighborhood exploration   top N% KEYS-R

  Lesson in simplicity.

Benchmarked Algorithms

  KEYS much be benchmarked against standard SBSE techniques.   Simulated Annealing, MaxWalkSat, A* Search

  Chosen techniques are discrete, sequential, unconstrained algorithms. [Gu ‘97]   Constrained searches work towards a pre-determined number

of solutions, unconstrained adjust to their goal space.

21

Simulated Annealing

  Classic, yet common, approach. [Kirkpatrick ’83]   Choose a random starting position.   Look at a “neighboring” configuration.

  If it is better, go to it.   If not, move based on guidance from probability function

(biased by the current temperature).

  Over time, temperature lowers. Wild jumps stabilize to small wiggles.

22

MaxWalkSat 23

  Hybridized local/random search. [Kautz ‘96, Selman ‘93]

  Start with random configuration.   Either perform

  Local Search: Move to a neighboring configuration with a better score. (70%)

  Random Search: Change one random mitigation setting. (30%)

  Keeps working towards a score threshold. Allotted a certain number of resets, which it will use if it fails to pass the threshold within a certain number of rounds.

A* Search 24

  Best first path-finding heuristic. [Hart ‘68]   Uses distance from origin (G) and estimated cost to

goal (H), and moves to the neighbor that minimizes G+H.

  Moves to new location and adds the previous location to a closed list to prevent backtracking.

  Optimal search because it always underestimates H.   Stops after being stuck for 10 rounds.

Other Methods 25

  [Gu ‘97]’s survey lists hundreds of methods!   Gradient descent methods and sensitivity analysis

assume a continuous range for model variables.   DDP models are discrete!

  Integer programming could still be used (CPLEX [Mittelmann ‘07])   Too slow! [Coarfa ‘00]   SE problems are over-constrained, so a solution over all

constraints is not possible. [Harman ’07]   Parallel algorithms

  Communications overhead overwhelms benefits.

Roadmap 26

  Model   Algorithms   Experiments   Future Work   Conclusions

Experiment 1: Costs and Attainments 27

  We use real-world models 2,4,5 (1,3 are too small and were only used for debugging).   Models discussed in [Feather ‘02, Jalali ‘08, Menzies ‘03]

  Run each algorithm 1000 times per model.   Removed outlier problems by generating a lot of data points.   Still a small enough number to collect results in a short time

span.

  Graph cost and attainment values.   Values towards bottom-right better.

Experiment 1 Results 28

Model

Algorithm

Cost (Y-Axis)

Attainment (X-Axis)

Experiment 1 Results 29

Bad

Good

Experiment 1 Results 30

Summing it Up… 31

Feature Simulated Annealing

MaxWalkSat A* KEYS KEYS2

High-Quality Results No No Yes Yes Yes

Low Result Variance (Tame)

No No No Yes Yes

Scores Plateau to Stability (Well-behaved)

? ? ? ? ?

Real-Time Results ? ? ? ? ?

Scalability ? ? ? ? ?

Experiment 2: Runtimes 32

  For each model:   Run each algorithm 100

times.   Record runtime using

Unix “time” command.   Divide runtime/100 to get

average.

Experiment 2 Results 33

Model 2 (31 mitigations)

Model 4 (58 mitigations)

Model 5 (99 mitigations)

Simulated Annealing

0.577 1.258 0.854

MaxWalkSat 0.122 0.429 0.398

A* Search 0.003 0.017 0.048

KEYS 0.011 0.053 0.115

KEYS2 0.006 0.018 0.038

Summing it Up… 34

Feature Simulated Annealing

MaxWalkSat A* KEYS KEYS2

High-Quality Results No No Yes Yes Yes

Low Result Variance (Tame)

No No No Yes Yes

Scores Plateau to Stability (Well-behaved)

? ? ? ? ?

Real-Time Results No No Yes Yes Yes

Scalability ? ? ? ? ?

Experiment 3: Scale-Up Study 35

  By 2013, we expect DDP models 8x larger than those used in this thesis (from 2008)

  KEYS and KEYS2 are “real time” now, but will they scale?

Year Num. Variables

2004 30

2008 100

2010 300

2013 800

Artificial Model Generation 36

  To test scalability, a generator builds synthesized models by:   Studying the existing real-world models and collecting

statistics on its internal structure.   Mutating them into larger models based on user-input size,

density, and distribution altering parameters

  Models built that were 2,4, and 8 times larger than existing models.

Scale-Up Results (1) 37

Scale-Up Results (2) 38

Scale-Up Results (3) 39

KEYS KEYS2

Runtimes Model Calls Runtimes Model Calls

Exponential 0.82 0.83 0.88 0.93

Polynomial (of degree 2)

0.99 0.99 0.99 0.98

  KEYS and KEYS2 fit to O(N2)   Both scale to larger models, but KEYS requires

exponentially more model calls (thus, large jump in execution time)

  Thus, we recommend KEYS2

Summing it Up… 40

Feature Simulated Annealing

MaxWalkSat A* KEYS KEYS2

High-Quality Results No No Yes Yes Yes

Low Result Variance (Tame)

No No No Yes Yes

Scores Plateau to Stability (Well-behaved)

? ? ? ? ?

Real-Time Results No No Yes Yes Yes

Scalability No No Maybe No Yes

Decision Ordering Diagrams 41

  Design of KEYS2 automatically provides a way to explore the decision neighborhood.

  Decision ordering diagrams – Visual format that ranks decisions from most to least important . [Gay ‘10]

Decision Ordering Diagrams (2) 42

  These diagrams can be used to assess solution robustness in linear time by   (A) Considering the variance in performance after applying X

decisions.  Spread = measure of variance (75th-25th quartile)

Decision Ordering Diagrams (3) 43

  These diagrams can be used to assess solution robustness in linear time by   (B) Comparing the results of using the first X decisions to that

of X-1 or X+1.

Decision Ordering Diagrams (4) 44

  Useful under three conditions:   (a) scores output are well-behaved   (b) variance is tamed   (c) they are generated in a timely manner.

Summing it Up… 45

Feature Simulated Annealing

MaxWalkSat A* KEYS KEYS2

High-Quality Results No No Yes Yes Yes

Low Result Variance (Tame)

No No No Yes Yes

Scores Plateau to Stability (Well-behaved)

No No No Yes Yes

Real-Time Results No No Yes Yes Yes

Scalability No No Maybe Yes Yes

Roadmap 46

  Model   Algorithms   Experiments   Future Work   Conclusions

Future Work 47

  Weakness in KEYS2 – it must call the model 100x per round.

  70% of execution time is spent calling the model.

  To improve –call the model less.   Problem: do this without

losing support!

Future Directions 48

  Two main parts – cache and active learning.   Don’t call the model, just look up scores.   Clustering must be efficient [Jiang ’08, Poshyvanyk ‘08]

  First round – generate 100 random configurations.   Greedily cluster them, assign identifiers to each, build a

hierarchical distance tree.   After this, can simply drop new instances into the tree (as the

number of clusters will remain fixed).   The model only needs to be called for new instances.

Future Directions (2) 49

  Most learners passive – blindly generating or reading data.

  Active learners exercise control over what data they learn on [Cohn ‘94].

  Many clusters are linear – their members form a smooth area of the search space with similar scores [Shepperd ‘97].

  If a new configuration falls into a linear region, don’t poll the model, just interpolate its score.

  Train KEYS2 to outright ignore linear regions with poor scores.

Treatment Learning (NASA Ames)

Bayesian Nets (Tsinghua University)

  Work with Misty Davies, Karen Gundy-Burlet.

  Design process: Run simulations, then apply Treatment Learning via TAR3 or TAR4.1 [Menzies ‘03b]

  KEYS is a multi-stage version of TAR4.1

  Future – KEYS as part of the simulator.

  Work with Hongyu Zhang.   Bayes Net = directed graph of

variables and their influences.   Variable ordering problem

[Hsu ‘04]   Order of variables examined by

learning process is crucial to performance!

  Need a ranking from best -> worst   Genetic algorithms sometimes used,

but they are slow.   KEYS offers a potential solution

to the VOP.

50

KEYS as a Component

Roadmap 51

  Model   Algorithms   Experiments   Future Work   Conclusions

Conclusions 52

  Optimization tools can study the space of requirements, risks, and mitigations.

  Finding a balance between costs and attainment is hard!

  Such solutions can be brittle, so we must comment on solution robustness. [Harman ‘01]

Conclusions (2) 53

  Pre-experimental concerns:   An algorithm would need to trade solution quality for

robustness (variance vs score).   Demonstrating solution robustness is time-consuming and

requires multiple procedure calls.

  KEYS2 defies both concerns.   Generates higher quality solutions than standard methods, and

generates results that are tame and well-behaved (thus, we can generate decision ordering graphs to assess robustness).

  Is faster than other techniques, and can generate decision ordering graphs in O(N2)

We Recommend KEYS2 54

Feature Simulated Annealing

MaxWalkSat A* KEYS KEYS2

High-Quality Results No No Yes Yes Yes

Low Result Variance (Tame)

No No No Yes Yes

Scores Plateau to Stability (Well-behaved)

No No No Yes Yes

Real-Time Results No No Yes Yes Yes

Scalability No No Maybe No Yes

References 55

  Slide 3:   [Harman ‘01] M. Harman and B.F. Jones. Search-based software engineering. Journal of Information and Software

Technology, 43:833–839, December 2001.   [Harman ‘07] Mark Harman. The current state and future of search based software engineering. In Future of Software

Engineering, ICSE’07. 2007.   Slide 4:

  [Feather ‘08] M. Feather, S. Cornford, K. Hicks, J. Kiper, and T. Menzies. Application of a broad- spectrum quantitative requirements model to early-lifecycle decision making. IEEE Software, 2008.

  [Nielson ‘93] J. Nielson. Usability Engineering. Academic Press, 1993.

  Slide 7:   [Boehm ‘81] B. W. Boehm. Software Engineering Economics. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1981.

  Slide 9:   [Cornford ‘01] S.L. Cornford, M.S. Feather, and K.A. Hicks. DDP a tool for life-cycle risk management. In IEEE Aerospace

Conference, Big Sky, Montana, pages 441–451, March 2001.   [Feather ‘02] M.S. Feather and T. Menzies. Converging on the optimal attainment of requirements. In IEEE Joint Conference On

Requirements Engineering ICRE’02 and RE’02, 9-13th September, University of Essen, Germany, 2002.   [Jalali ‘08] Tim Menzies, Omid Jalali, and Martin Feather. Optimizing requirements decisions with keys. In Proceedings

PROMISE ’08 (ICSE), 2008.   Slide 12:

  [Cruz ‘73] Cruz, J.B., editor. System Sensitivity Analysis. Dowden, Hutchinson, & Ross. Stroudsburg, PA. 1973.   [Maclean ‘96] A. MacLean, R.M. Young, V. Bellotti, and T.P. Moran. Questions, options and criteria: Elements of design space

analysis. In T.P. Moran and J.M. Carroll, editors, Design Rationale: Concepts, Techniques, and Use, pages 53–106. Lawerence Erlbaum Associates, 1996.

  [Mylopoulos ‘99] J. Mylopoulos, L. Cheng, and E. Yu. From object-oriented to goal-oriented requirements analysis. Communications of the ACM, 42(1):31–37, January 1999.

References (2) 56

  Slide 15:   [Harman ‘04] Mark Harman and John Clark. Metrics are fitness functions too. In 10th International Software Metrics

Symposium (METRICS 2004), 2004), pages = 58–69, location = Chicago, IL, USA, publisher = IEEE Computer Society Press, address = Los Alamitos, CA, USA.

  Slide 16:   [Amarel ‘86] S. Amarel. Program synthesis as a theory formation task: Problem representations and solution methods. In R.

S. Michalski, J. G. Carbonell, and T. M. Mitchell, editors, Machine Learning: An Artificial Intelligence Approach: Volume II, pages 499–569. Kaufmann, Los Altos, CA, 1986.

  [Crawford ‘94] J.Crawford and A.Baker. Experimental results on the application of satisfiability algorithms to scheduling problems. In AAAI ’94, 1994.

  [Kohavi ‘97] Ron Kohavi and George H. John. Wrappers for feature subset selection. Artificial Intelli- gence, 97(1-2):273–324, 1997.

  [Menzies ‘03] T. Menzies and H. Singh. Many maybes mean (mostly) the same thing. In M. Madravio, editor, Soft Computing in Software Engineering. Springer-Verlag, 2003.

  [Menzies ‘07] T. Menzies, D. Owen, and K. Richardson. The Strangest Thing About Software. Computer 40, 1 (Jan. 2007), 54-60.

  [Williams ’03] R.Williams, C.P.Gomes, and B.Selman. Backdoors to typical case complexity. In Proceedings of IJCAI 2003, 2003.

  Slide 18:   [Clark ‘05] R. Clark. Faster treatment learning, Computer Science, Portland State University. Master’s thesis, 2005.

  Slide 19:   [Gay ‘10] Gay, Gregory and Menzies, Tim and Jalali, Omid and Mundy, Gregory and Gilkerson, Beau and Feather, Martin and

Kiper, James. Finding robust solutions in requirements models. Automated Software Engineering, 17(1): 87-116, 2010.

References (3) 57

  Slide 21:   [Gu ‘97] Jun Gu, Paul W. Purdom, John Franco, and Benjamin W. Wah. Algorithms for the satisfiability (sat)

problem: A survey. In DIMACS Series in Discrete Mathematics and Theoretical Computer Science, pages 19–152. American Mathematical Society, 1997.

  Slide 22:   [Kirkpatrick ‘83] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Sci- ence,

Number 4598, 13 May 1983, 220, 4598:671–680, 1983.

  Slide 23:   [Kautz ‘96] Henry Kautz and Bart Selman. Pushing the envelope: Planning, propositional logic and stochastic

search. In Proceedings of the Thirteenth National Conference on Artificial Intel- ligence and the Eighth Innovative Applications of Artificial Intelligence Conference, pages 1194–1201, Menlo Park, August 4–8 1996. AAAI Press / MIT Press. Available from http://www.cc.gatech.edu/ ~jimmyd/summaries/kautz1996.ps.

  [Selman ‘93] Bart Selman, Henry A. Kautz, and Bram Cohen. Local search strategies for satisfiability testing. In Michael Trick and David Stifler Johnson, editors, Proceedings of the Second DIMACS Challange on Cliques, Coloring, and Satisfiability, Providence RI, 1993.

  Slide 24:   [Hart ‘68] P.E. Hart, N.J. Nilsson, and B. Raphael. A formal basis for the heuristic determination of minimum cost

paths. IEEE Transactions on Systems Science and Cybernetics, 4:100–107, 1968.

References (4) 58

  Slide 25:   [Coarfa ‘00] Cristian Coarfa, Demetrios D. Demopoulos, Alfonso San, Miguel Aguirre, Devika Subramanian, and

Moshe Y. Vardi. Random 3-sat: The plot thickens. In In Principles and Practice of Constraint Programming, pages 143–159, 2000.

  [Mittelmann ‘07] H.D. Mittelmann. Recent benchmarks of optimization software. In 22nd Euorpean Conference on Operational Research, 2007.

  Slide 48:   [Jiang ‘08] Y. Jiang, B. Cukic, and T. Menzies. Does transformation help? In Defects 2008, 2008.   [Poshyvanyk ‘08] D. Poshyvanyk A. Marcus and R. Ferenc. Using the conceptual cohesion of classes for fault

prediction in object oriented systems. IEEE Transactions on Software Engineering, 34:287–300, 2 2008.

  Slide 49:   [Cohn ‘94] David Cohn, Les Atlas, and Richard Ladner. Improving generalization with active learning. Mach.

Learn., 15(2):201–221, 1994.   [Shepperd ‘97] M. Shepperd and C. Schofield. Estimating software project effort using analogies. IEEE

Transactions on Software Engineering, 23(12), November 1997.   Slide 50:

  [Menzies ’03b] T. Menzies and Y. Hu. Data mining for very busy people. In IEEE Computer, November 2003. Available from http://menzies.us/pdf/03tar2.pdf.

  [Hsu ‘04] Hsu, W.H. Genetic Wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning. Inf. Sci. 163, 1-3 (June 2004), 103-122.

Questions? 59

  Want to contact me later?   Email: greg@greggay.com   Facebook: http://facebook.com/greg.gay   Twitter: http://twitter.com/Greg4cr

  More about me: http://www.greggay.com