Make the Most of Your Time: How Should the Analyst Work with Automated Traceability Tools

Make the Most of Your Time: How Should the Analyst Work with

Automated Traceability Tools?*

Alex DekhtyarJane Huffman HayesJody Larsen*Funded by NASA

IntroductionIntroduction

• What is requirements traceability?• Information retrieval and traceability• What’s different about traceability?• Research questions• Experimental design & validity• Results• Analysis• Future Work• Questions

What is Requirements Traceability?What is Requirements Traceability?

“Requirements traceability is the ability to describe and follow the life of a requirement, in both a forward and backward direction, i.e., from its origins, through its development and specification, to its subsequent deployment and use, and through periods of ongoing refinement and iteration in any of these phases.” Gotel and Finkelstein, 1994.

Why is Traceability Important?Why is Traceability Important?

• Requirements verification and validation

• Impact analysis

• Validating contracts and bids

• Testing Coverage

RepresentationDesign Document

Information Retrieval and TraceabilityInformation Retrieval and Traceability

Requirements Document

Matching algorithm

Analyst

What’s Different about Traceability?What’s Different about Traceability?

• Typical Information Retrieval Problem:– Search Engines: Google, Yahoo– Large and small documents– Document indexes must support arbitrary

search queries– High accuracy needed

• Requirements Traceability– Queries are known prior to indexing– Documents are all very small– High coverage needed

IR Based Traceability FrameworksIR Based Traceability Frameworks

• ADAMS Re-Trace– Part of an artifact management suite– Latent Semantic Indexing (LSI) Based– Web Application

• Poirot: Tracemaker– Probabilistic Information Retrieval– Web Application

• RETRO– Vector Space Model and LSI– Windows Application

RETRO.NETRETRO.NET

• Candidate Link• Top Link• Feedback• Confirmed Links

Research QuestionsResearch Questions

• What is the optimal upper bound on analyst efficiency using Information Retrieval based tools?

• What analyst behaviors produce accurate and complete RTMs most efficiently?

Experimental DesignExperimental Design

• Subject– A simulated analyst engaging in a

specified behavior

• Variables– Independent

• Analyst Behavior– Dependent

• Confirmed True Links• Observed Links• Confirmed Recall• Precision• Selectivity

N - number of low-level requirements;M - number of high-level requirements;

Hits - number of confirmed candidate linksStrikes - number of false positivesMisses - number of missed links

Precision =Hits

Hits + Strikes Recall =Hits

Hits +Misses

Selectivity =Hits + StrikesM * N

MeasuresMeasures

Global Ordering Without FeedbackGlobal Ordering Without Feedback

• All candidate link lists merged

• Links processed one at time in descending order

• Results in analyst looking at links that have highest global similarity ratings first

Local Ordering Without FeedbackLocal Ordering Without Feedback

• Each candidate link list sorted in descending order

• Top link for each requirement element is processed

• Results in analyst looking at the links for each requirement by highest similarity rating

Global Ordering With FeedbackGlobal Ordering With Feedback

• All candidate link lists merged

• Links processed one at time in descending order

• After each link is processed, feedback is executed

• Results in analyst looking at links that have highest global similarity ratings first

Local Ordering With FeedbackLocal Ordering With Feedback

• Each candidate link list sorted in descending order

• Top link for each requirement element is processed

• After all top links have been processed, feedback is executed

• Results in analyst looking at the links for each requirement by highest similarity rating

DatasetDataset

• CM-1• http://mdp.ivv.nasa.gov/mdp_glossary.html#CM1

# of elements in # of elements in requirements (high-level)requirements (high-level)

235

# of elements in design # of elements in design specification (low-level)specification (low-level)

220

# of correct links# of correct links 361

Total # of retrieved Total # of retrieved candidate linkscandidate links

36,556

Total # of correct links Total # of correct links retrieved by basic Vector retrieved by basic Vector SpaceSpace

358

RecallRecall 99%

PrecisionPrecision 0.1%

SelectivitySelectivity 70.7%

• In PROMISE repository• Sanitized dataset for a

NASA scientific instrument • Goal is to help analyst

avoid examining all 36,556 links

http://mdp.ivv.nasa.gov/mdp_glossary.html#CM1

Experimental Design DetailExperimental Design Detail

• Two Studies– Impact of Behavior on Analyst Effort when

Recall is fixed• Recall set to 89%• Included all four structured behaviors

– Impact of Behavior on Precision and Recall when Analyst Effort is fixed

• Effort fixed to 1595 observed links• Added two additional behaviors

– Global, Random Order– Local, Random Order

Random?Random?

• Menzies et al. note that software analysis should “start with random methods because they are so cheap, moving to the more complex methods only when random methods fail.”

• Source: Menzies, T., Owen, D., and Richardson, J. The Strangest Thing about Software. IEEE Computer, January 2007, pp. 54 – 60.

Random MethodsRandom Methods• Global Filtered Random Selection

– All Candidate Link Lists Merged– List Sorted By Similarity– Bottom 75% (Least Relevant) Removed– 1595 Links Selected Randomly

• Local Filtered Random Selection– All Candidate Link Lists Sorted By

Similarity– Bottom 75% (Least Relevant) Removed– 7 or 8 Links Selected Randomly For Each

Requirement– 1595 Links Selected

• Executed 1000 times• Median values were used

Experiment OperationExperiment Operation

• Input:– Requirement and Specification Documents– Desired Analyst Behavior

• Operation:– Performs multiple rounds of interaction– Provides link status determination for each

observed link

• Output:– Candidate RTM– Dependent Variable Data

Experiment ValidityExperiment Validity• Internal Threats

– Accuracy of Requirements Traceability Matrix

– Suitability of text for Information Retrieval methods

• External Threats– Can Results Based on CM-1 Dataset be

Generalized?– Realism of Simulated Analyst Behavior

Study 1 Results, Fixed RecallStudy 1 Results, Fixed Recall

Observed LinksObserved Links PrecisionPrecision SelectivitySelectivity

Local, FeedbackLocal, Feedback 1595 20% 3%

Local, No FeedbackLocal, No Feedback 1713 19% 3%

Global, FeedbackGlobal, Feedback 5399 6% 10%

Global, No FeedbackGlobal, No Feedback 6149 5% 12%

Study 2 Results, Fixed EffortStudy 2 Results, Fixed Effort

Confirmed LinksConfirmed Links PrecisionPrecision RecallRecall

Local, FeedbackLocal, Feedback 321 20% 89%

Local, No FeedbackLocal, No Feedback 326 20% 87.5%

Global, FeedbackGlobal, Feedback 236 15% 65%

Global, No FeedbackGlobal, No Feedback 227 14% 63%

Local, RandomLocal, Random 65 4% 18%

Global, RandomGlobal, Random 58 3.6% 16%

Study 2 Intermediate ResultsStudy 2 Intermediate Results

0

50

100

150

200

250

300

350

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Observed Candidate Links

Ob

serv

ed C

orr

ect

Lin

ks

Global, No Feedback Global, feedback Local, Feedback Local No Feedback

AnalysisAnalysis• Knowing when to stop is crucial

– Accounts for largest difference between local versus global

• Feedback helps– Feedback resulted in a 7% to 13%

reduction of effort

• Have a system– Unstructured analyst behavior is not

efficient

Future WorkFuture Work

• User Interface Design

• Larger Datasets

• Live Analysts

Questions?Questions?

Make the Most of Your Time: How Should the Analyst Work with Automated Traceability Tools

Technology

Transcript of Make the Most of Your Time: How Should the Analyst Work with Automated Traceability Tools