Empirical Examination of Use Case Points - Wydzia‚ Informatyki

Poznan University of Technology

Faculty of Computing

Identification of Events in Use Cases

Jakub Jurkiewicz

A dissertation

submitted in partial fulfillment of the

requirements for the degree of

Doctor of Philosophy.

Supervisor: Jerzy Nawrocki, PhD, Dr Habil.Co-supervisor: Mirosław Ochodek, PhD

Poznan, Poland, 2013

A B S T R A C T

Context: Requirements are one of the most important elements of software development.They involve many stakeholders, including IT professionals (Analysts, Project Manager, Ar-chitects, etc.) and IT laymen (e.g. Customers, End Users, Domain Experts, Investors). Inorder to build a bridge between those two groups, requirements are often written in a nat-ural language, e.g. in the form of use cases. They describe interaction between an end-user(called “actor”) and the system being built. The most important element of a use case is amain scenario leading to obtaining actor’s goal in the most “typical" way. A main scenariois a sequence of steps taken by the actor or the system. Each step describes a simple ac-tion in a natural language (e.g. “Student enters his account and password."). Deviationsfrom the main scenario are also documented within a use case. They are usually called ex-tensions and are described as alternative interactions. Each extension is a response to agiven event causing the deviation (e.g. “Account or password is wrong."). The first problemis effectiveness of event identification, i.e. degree of completeness of identification of theevents causing deviations from the main scenario (in an ideal case all the events should beidentified). Another problem is efficiency, i.e. time necessary to identify events. Identifyingall such events is very important, as unidentified events could negatively impact not onlycoding but also effort estimation, user interface design, user manual writing, architectureevaluation, etc. In general, three approaches to event identification are possible: 1) an adhoc approach used by humans, 2) a specialised method of event identification to be used byhumans, 3) an automatic method to be executed by a computer.

Objective: The aim of the thesis is to propose an automatic method of event identifica-tion that would not be worse than identification made by humans, in terms of effectiveness(event completeness) and efficiency (time).

Method: To develop an automatic method of event identification, a pool of almost 200 usecases was divided equally into a learning part and a testing part. Through analysis of thelearning part, 14 abstract event types and two inference rules were identified. Using NLPtools (Stanford Parser, OpenNLP) and a use case editor (UC Workbench), a pilot version of atool for automatic event identification in use cases was designed and implemented. The toolwas used to evaluate both the effectiveness (event completeness) and efficiency (time) of theproposed method against results obtained from human-based experiments performed inacademia (18 students) and industry (64 practitioners). 9 students and 32 practitioners usedthe ad hoc approach, similarly 9 students and 32 practitioners used a HAZOP-based methodcalled H4U. Both the human-based experiment and the evaluation of the proposed methodwere based on a use-case benchmark developed at the Poznan University of Technology,Faculty of Computing.

Results: From the thesis it follows that:

• the average effectiveness of event identification performed by humans using the adhoc approach is at the level of 0.18;

• the average effectiveness of event identification performed by humans using theHAZOP-based approach is at the level of 0.26 and the difference between those twoapproaches is statistically significant;

• an automatic tool for event identification can be created and its average effective-ness is at the level of 0.8 (the automatic method outperformed humans and this isstatistically significant);

• the speed of analysis of use cases, for the purpose of event identification, with thead hoc approach is at the level of 2.5 steps per minute and with the HAZOP-basedmethod is at the level of 0.5 steps per minute;

• the speed of analysis of use cases by the automated tool, for the purpose of eventidentification, is at the level of 10.8 steps per minute;

• using NLP tools one can automatically identify 9 bad smells in use cases which arebased on the good practices proposed Adolph et al. in their book Patterns for EffectiveUse Cases.

Conclusions: It is possible to build a commercial tool aimed at automatic event identifica-tion in use cases, that would be a part of an intelligent use-case editor. Such a tool wouldsupport breath-first approach to elicitation of use cases and thus would fit the agile ap-proach to software development.

Politechnika Poznanska

Wydział Informatyki

Identyfikacja zdarzen w

przypadkach uzycia

Jakub Jurkiewicz

Rozprawa doktorska

Promotor: dr hab. inz. Jerzy NawrockiPromotor pomocniczy: dr inz. Mirosław Ochodek

Poznan, 2013

S T R E S Z C Z E N I E

Kontekst: Wymagania sa jednym z najwazniejsych elementów wytwarzania oprogramowa-nia. Ich zbieranie angazuje wielu interesariuszy, właczajac w to profesjonalistów IT (anality-ków, menadzerów projektu, architektów, itp.), a takze osoby spoza branzy IT (n.p. klientów,uzytkowników koncowych, ekspertów z wybranej dziedziny). Aby stworzyc pomost miedzytymi dwoma grupami, wymagania czesto zapisywane sa w jezyku naturalnym, np. za po-moca przypadków uzycia. Przypadki uzycia opisuja interakcje miedzy uzytkownikiem kon-cowym (nazywanym “aktorem”) oraz budowanym systemem. Najwazniejszym elementemprzypadku uzycia jest scenariusz główny opisujacy typowy przebieg interkacji prowadzacydo osiagniecia przez aktora konkretnego celu. Scenariusz główny to sekwencja kroków wy-konywanych przez aktora i system. Kazdy krok opisuje prosta akcje w jezyku naturalnym(np. “Student podaje dane personalne”). Odchylenia od głównego scenariusza sa równiezzapisywane w przypadku uzycia. Nazywane sa zwykle rozszerzeniami i sa opisywane za po-moca scenariuszy alternatywnych. Kazde rozszerzenie jest odpowiedzia na zdarzenie powo-dujace odchylenie. Pierwszym problemem jest efektywnosc identyfikacji zdarzen, czyli jakzidentyfikowac wszystkie zdarzenia powodujace odchylenia od scenariusza głownego? In-nym problemem jest wydajnosc, czyli czas potrzebny do identyfikacji zdarzen. Identyfikacjazdarzen jest bardzo istotna, gdyz niezidentyfikowane zdarzenia moga negatywnie wpłynacnie tylko na etap tworzenia kodu systemu, lecz równiez na szacowanie pracochłonnosci,projektowanie interfejsu uzytkownika, tworzenie dokumentacji uzytkownika, ocene archi-tektury systemu, itp. W ogólnosci mozliwe sa trzy sposoby na identyfikacje zdarzen: 1)podejscie ad hoc uzywane do przegladów recznych; 2) specjalistyczna metoda identyfika-cji zdarzen uzywana do przegladów recznych; 3) automatyczna metoda wykonywana przezkomputer.

Cel: Celem tej pracy jest zaproponowanie automatycznej metody identyfikacji zdarzen,która identyfikowałaby zdarzenia niegorzej (w sensie efektywnosci i wydajnosci) niz metodyprzegladów recznych.

Metoda: Aby opracowac metode automatycznej identyfikacji zdarzen, zbiór prawie 200przypadków uzycia został podzielony równo na czesc uczaca oraz czesc testujaca. Analizaczesci uczacej pozwoliła na wyodrebnienie 14 abstrakcyjnych typów zdarzen oraz dwóch re-guł wnioskowania. Z wykorzystaniem narzedzi do przetwarzania jezyka naturalnego (Stan-ford Parser, OpenNLP) oraz edytora przypadków uzycia (UC Workbench) został zaimple-mentowany prototyp narzedzia do automatycznej identyfikacji zdarzen. Narzedzie to zo-stało ocenione pod wzgledem efektywnosci oraz wydajnosci. Wyniki zostały porównane dowyników uzyskanych przez uczestników eksperymentów, którzy korzystali z podejscia adhoc oraz podejscia bazujacego na metodzie HAZOP. Do oceny wyników narzedzia oraz wy-ników przegladów wykonywanych recznie wykorzystany była benchmarkowa specyfikacjawymagan.

Wyniki: Praca przedstawia nastepujace wyniki:

• srednia efektywnosc identyfikacji zdarzen wykonywana podejsciem ad hoc wynosi0,18;

• srednia efektywnosc identyfikacji zdarzen wykonywana podejsciem opartym na me-todzie HAZOP wynosi 0,26; róznica efektywnosci miedzy podejsciem ad hoc a podej-sciem opartym na metodzie HAZOP jest znaczaca statystycznie;

• efektywnosc automatycznej metody identyfikacji zdarzen wynosi 0,8 (metoda auto-matyczna daje lepszy wynik niz metody przegladów recznych, róznica ta jest istotnastatystycznie);

• predkosc analizy przypadków uzycia, w celu identyfikacji zdarzen, przy wykorzysta-niu podejscia ad hoc wynosi 2,5 kroku na minute, a przy wykorzystaniu podejsciaopartego o metode HAZOP wynosi 0,5 kroku na minute;

• predkosc automatycznej analizy przypadków uzycia, w celu identyfikacji zdarzen,wynosi 10,8 kroków na minute;

• narzedzia NLP moga byc skutecznie wykorzystane do wykrywanie przykrych zapa-chów w przypadkach uzycia; przedstawione metody sa w stanie automatycznie wy-krywac 9 przykrych zapachów.

Konkluzje: Mozliwe jest zbudowanie komercjalnego narzedzia ukierunkowanego na auto-matyczna identyfikacje zdarzen w przypadkach uzycia. Narzedzie to mogłoby byc czesciainteligentnego edytora przypadków uzycia. Takie narzedzie wspierałoby iteracyjne podejsciedo tworzenia przypdków uzycia, przez co wpisywałoby sie w zwinne mtodyki wytwarzaniaoprogramowania.

Acknowledgments

This thesis would not have been possible without the help of numerous people. I

would like to mention some of them here. Without their help and support this thesis

would have never be written.

First of all, I would like to thank Professor Jerzy Nawrocki for showing and guid-

ing me through the exciting world of research problems and challenges. Without

his support and priceless feedback I could never have carried out the research pre-

sented in this thesis.

Many thanks also go to Miroslaw Ochodek, who, with his knowledge and expe-

rience, helped me at every stage of my research adventure.

I would also like to thank the rest of my colleagues from the Software Engineer-

ing group, with whom I had the pleasure to collaborate: Bartosz Alchimowicz, Sylwia

Kopczynska, Michal Mackowiak, Bartosz Walter, Adam Wojciechowski. They were

always eager to listen and help in dealing with the problems I encountered.

Experiments and case studies are a crucial part of this thesis. In real life it is

really hard to conduct experiments with an extended number of IT professionals,

however, there were companies which devoted the time of their employees for the

experimental tasks. Therefore, I would like to thank the following companies and or-

ganisations for their help: GlaxoSmithKline, Poznan Supercomputing and Network-

ing Center, Talex, Poznan University of Technology (especially people from the Es-

culap project) and PayU.

I am thankful to the Authorities of the Institute of Computing Science at Poz-

nan University of Technology for creating an environment in which I could study,

develop my research skills and learn from my colleagues.

I would also like to thank the Polish National Science Center for supporting my

research under the grant DEC-2011/01/N/ST6/06794.

Last but not least, I want to thank my wife, my parents, sister and friends for

their constant support and encouragement during my work on this thesis.

Jakub Jurkiewicz, Poznan, September 2013

List of journal papers by Jakub Jurkiewicz (sorted by 5-year impact factor):

Symbols:

IF = 5-year impact factor according to Journals Citations Report (JCR) 2012

Ranking = ISI ranking for the Software Engineering category according to JCR 2012

Scholar Citations = citations according to Google Scholar, visited on 2013-09-20

Ministry = points assigned by Polish Ministry of Sci. and Higher Education as of Dec. 2012.

1. J. Jurkiewicz, J. Nawrocki, M. Ochodek, T. Glowacki. HAZOP-based identifica-

tion of events in use cases. An empirical study, Empirical Software Engineering

DOI: 10.1007/s10664-013-9277-5 (available on-line but not assigned to an is-

sue yet)

IF: 1.756; Ranking: 20/105; Scholar Citations: 0; Ministry: 30.

2. M. Ochodek, B. Alchimowicz, J. Jurkiewicz, J. Nawrocki: Improving the relia-

bility of transaction identification in use cases, Information and Software Tech-

nology, 53(8), pp. 885 - 897, 2011.

IF: 1.692; Ranking: 24/105; Scholar Citations: 5; Ministry: 30.

3. B. Alchimowicz, J. Jurkiewicz, M. Ochodek, J. Nawrocki. Building benchmarks

for use cases. Computing and Informatics, 29(1), pp. 27 - 44, 2010.

IF: 0.305; Ranking: 104/115 1; Scholar Citations: 4; Ministry: 15

List of conference papers by Jakub Jurkiewicz (sorted by the Ministry points):

1. B. Alchimowicz, J. Jurkiewicz, M. Ochodek, J. Nawrocki. Towards Use-Cases

Benchmark, Lecture Notes in Computer Science. vol. 4980, pp. 20 - 33, 2011.

Scholar Citations: 1; Ministry: 10

2. A. Ciemniewska, J. Jurkiewicz, J. Nawrocki, L. Olek: Supporting Use-Case Re-

views, Lecture Notes in Computer Science vol. 4439, pp. 424 - 437, 2007.


3. A. Gross, J. Jurkiewicz, J. Doerr, J. Nawrocki. Investigating the usefulness of no-

tations in the context of requirements engineering. 2nd International Workshop

on Empirical Requirements Engineering (EmpiRE), pp. 9 - 16, 2012.


4. M. Ochodek, B. Alchimowicz, J. Jurkiewicz, and J. Nawrocki. Reliability of

transaction identification in use cases. Proceedings of the Workshop on Ad-

vances in Functional Size Measurement and Effort Estimation, pp. 51 - 58,

2010.


1Category: Artificial Intelligence

Contents

List of Abbreviations IV

1 Introduction 1

1.1 Scenario-based requirements specifications and events identifica-

tion problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Aim and scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Use cases and their applications 5

2.1 Levels of use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Actors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 Information Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Use Cases, Actors, Information Objects and Software Requirements

Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.5 Good practices for writing use cases . . . . . . . . . . . . . . . . . . . 9

2.5.1 Specify the actors . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.5.2 Use simple grammar . . . . . . . . . . . . . . . . . . . . . . . . 10

2.5.3 Indicate who is the actor in a step . . . . . . . . . . . . . . . 10

2.5.4 Do not use conditional statements in a main scenario . . . 10

2.5.5 Do not write too short or too long scenarios . . . . . . . . . 11

2.5.6 Do not use technical jargon and GUI elements in steps . . . 11

2.5.7 Write complete list of possible events . . . . . . . . . . . . . 11

2.5.8 All specified events should be detectable . . . . . . . . . . . 11

2.6 Applications of use cases . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Basic NLP methods and tools 13

3.1 Sentence segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.2 Word tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 Part-of-speech tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

I

II

3.4 Lemmatization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.5 Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.6 Dependency relations tagging . . . . . . . . . . . . . . . . . . . . . . . 17

4 Simple methods of quality assurance for use cases 18

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2 Use Cases and Bad Smells . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3 Natural Language Processing Tools for English . . . . . . . . . . . . . 21

4.4 Defects in Use Cases and their Detection . . . . . . . . . . . . . . . . 23

4.4.1 Specification-Level Bad Smells . . . . . . . . . . . . . . . . . . 24

4.4.2 Use-Case Level Bad Smells . . . . . . . . . . . . . . . . . . . . 25

4.4.3 Step-Level Bad Smells . . . . . . . . . . . . . . . . . . . . . . . 27

4.5 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5 Use-case-based benchmark specification 33

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.2 Benchmark-oriented model of use-case-based specification . . . . 35

5.3 Analysis of uc-based specifications structure . . . . . . . . . . . . . . 36

5.4 Analysis of use-cases-based specifications defects . . . . . . . . . . . 39

5.5 Building referential specification . . . . . . . . . . . . . . . . . . . . . 42

5.5.1 Specification domain and structure . . . . . . . . . . . . . . . 42

5.5.2 Use cases structure . . . . . . . . . . . . . . . . . . . . . . . . 43

5.6 Benchmarking use cases - case study . . . . . . . . . . . . . . . . . . 45

5.6.1 Quality analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.6.2 Time analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6 Identification of events in use cases performed by humans 50

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.2 An evolutionary approach to writing use cases . . . . . . . . . . . . . 52

6.2.1 Overview of use cases . . . . . . . . . . . . . . . . . . . . . . . 52

6.2.2 Process of use-case elicitation . . . . . . . . . . . . . . . . . . 53

6.2.3 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.3 HAZOP based use-case review . . . . . . . . . . . . . . . . . . . . . . 56

6.3.1 HAZOP in a nutshell . . . . . . . . . . . . . . . . . . . . . . . . 56

6.3.2 H4U: HAZOP for Use Cases . . . . . . . . . . . . . . . . . . . . 57

6.3.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6.4 Experimental evaluation of individual reviews . . . . . . . . . . . . . 59

III

6.4.1 Empirical approximation of maximal set of events . . . . . . 60

6.4.2 Experiment 1: Student reviewers . . . . . . . . . . . . . . . . 63

6.4.3 Experiment 2: Experienced reviewers . . . . . . . . . . . . . . 70

6.5 Usage of HAZOP keywords . . . . . . . . . . . . . . . . . . . . . . . . . 74

6.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

7 Automatic identification of events in use cases 78

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

7.2 Use-case based requirements specification . . . . . . . . . . . . . . . 81

7.3 Inference engine for identification of events . . . . . . . . . . . . . . 82

7.3.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7.3.2 Abstract model of a use-case step . . . . . . . . . . . . . . . . 82

7.3.3 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.4 Natural Language Processing of Use Cases . . . . . . . . . . . . . . . 91

7.4.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

7.4.2 NLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

7.5 Generation of Event Descriptions . . . . . . . . . . . . . . . . . . . . . 93

7.6 Empirical evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7.6.1 Reference set of events . . . . . . . . . . . . . . . . . . . . . . 93

7.6.2 Sensible events . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

7.6.3 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

7.6.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7.6.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

7.6.6 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

7.6.7 Turing test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

7.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

8 Conclusions 103

A Material used in Turing test experiment 106

Bibliography 112

List of Abbreviations

A

AFP Activity Free Properties

API Application Programming Interface

ASP Activity Sensitive Properties

ATAM Architecture Tradeoff Analysis Method

ATM Automatic Teller Machine

B

BPM Business Process Modeling

BPMN Business Process Model and Notation

C

CRUD Create, Retrieve, Update, Delete

CRUD+SV Create, Retrieve, Update, Delete, Select, Validate

CYK Cockeoungerasami (CYK) algorithm

D

DBMS Database Management System

IV

List of Abbreviations V

E

ESE Empirical Software Engineering

F

FUR Functional User Requirements

G

GUI Graphical User Interface

H

H4U HAZOP for Use Cases

HAZOP Hazard and Operability Study

I

IT Information Technology

K

KAOS Knowledge Acquisition in automated specification

L

JAD Joint Application Development

N

NFR Non-functional Requirement

NLP Natural Language Processing

NLG Natural Language Generation

NRMSD Normalized Root Mean Squared Deviation

O

OMG Object Management Group

List of Abbreviations VI

P

PCFG Probabilistic Context Free Grammar

POS Part of Speech

S

SD Standard Deviation

SDS Software Development Studio

SE Software Engineering

SLIM Software Life-Cycle Management

SRS Software Requirements Specification

SuD System under Development (Design, or Discussion)

SVO Subject Verb Object

T

TBD To Be Determined

TTPoints Transaction-Typed Points

U

UC Use Case

UCDB Use Cases Database

UCP Use Case Points

UML Unified Modeling Language™

V

VDM Vienna Development Method

Chapter 1

Introduction

1.1 Scenario-based requirements specifications and events

identification problem

Requirements elicitation is one of the activities of software development. Require-

ments are of two kinds: functional and nonfunctional. The latter describe what a

system is able to do, whereas the former present design constraints and quality ex-

pectations concerning time behavior, security, portability, etc.

The role of software requirements cannot be overrated. The data presented by

a Standish Group report [7] show that about 20% of IT projects never finish suc-

cessfully and almost 50% face problems with on-time delivery and rising costs. Ac-

cording to the report, poor quality of requirements and creeping scope are two of

the most important factors contributing to project failures. This observation is sup-

ported by many other researchers [19, 68, 71, 105, 106, 116, 119, 124]. Niazi et al. [93]

present data based on interviews with IT specialists. Vague, inconsistent and incom-

plete requirements were the common answer to the question about the reason for

project failures. Another well known phenomenon is defect slippage between soft-

ware development phases. According to HP [50], fixing a defect in the requirements

elicitation phase costs 3 times less than fixing the same defect in the design phase

and 6 times less than in the coding phase. Thus, quality of requirements is an im-

portant issue.

One of the ways of expressing requirements is through scenarios. Glinz [47] de-

fines a scenario in the following way:

“A scenario is an ordered set of interactions between partners, usually

between a system and a set of actors external to the system."

According to Neil et al., scenarios are used in over 50% of software projects.

There are two types of scenarios: narrative and dialog-like. Narrative scenarios

1

1.1. Scenario-based requirements specifications and events identification problem 2

describe usage of the system in real-life situations [26] and they are used to present

a vision of the proposed system. They use story-telling techniques like mentioning

the settings of the scenario, introducing agents and using plot to present actions

[25]. Narrative scenarios are written without an imposed structure. As they describe

the system at a very high abstraction level and neglect many (sometimes quite im-

portant) details, they cannot be easily used by programmers.

The best known form of dialog scenarios are use cases, introduced by Jacobson

[59], and elaborated on by Cockburn [31]. Use cases have a structure consisting

of three main parts: a name expressing the goal of interaction, a main scenario in

the form of a sequence of steps (each step is an action taken by the system or a

human), and exceptions describing the possible deviations from the main scenario.

These kinds of scenarios allow for more detailed description of the requirements,

hence they better suited to the next software development phases (e.g. architectural

design, coding, black-box testing).

Use cases fit both classical and agile approaches to software development very

well. Use case names correspond to user stories used to describe the software prod-

uct (they would form the Product Backlog [107]). Frequently this level of abstraction

is too high and, for instance one of the Scrum practices is Product Backlog refine-

ment/grooming in which user stories are further elaborated into smaller ones or

tasks. The result of this work could be expressed in a form identical to the main sce-

nario of a use case (i.e. as a sequence of actions). If needed such dialog could be

even further elaborated upon by identifying events and actions taken in response to

their occurrence.

Identifying events and alternative scenarios corresponding to them is quite im-

portant for making the right architectural decisions. For instance, in the ATAM method

of software architecture evaluation, one of the activities, being part of the method,

is identifying alternative scenarios [70].

Unfortunately, there is no research concerning the effectiveness of events iden-

tification carried out by software experts and very little has been done to help peo-

ple do this effectively. Due to our observations, events are identified using the ad

hoc approach (the identification is done by the Analyst alone or during JAD ses-

sions [123]). Even the ATAM method does not suggest a particular way of identifying

events [70]. It is also not known if events identification could be done in an auto-

matic way and what the effectiveness of such an approach would be. Sutcliffe et al.

[115] describe a tool supporting the Analyst in identifying events, but it is restricted

only to events associated with violating non-functional requirements, and it requires

substantial input from an expert.

1.2. Aim and scope 3

1.2 Aim and scope

The overall aim of this thesis is to answer the following research questions:

• Is it possible to automate identification of exceptional situations (i.e. events)

in use cases ?

• What is the effectiveness of such an approach?

In order to answer those questions, this thesis should provide the following:

• A method for automatic identification of use-case events.

• Empirical evaluation of the effectiveness and efficiency of the proposed method

for automatic identification of events in use cases.

• Comparison of the effectiveness and efficiency of the proposed automatic method

and the method of manual reviews aimed at identification of use-case events.

Use cases are written using natural language, hence, the proposed automatic

method would have to be able to analyse use cases using Natural Language Pro-

cessing (NLP) techniques and tools. It is assumed that this thesis is not aimed at

the development of these kinds of methods, however, existing NLP tools would be

used in order to analyse use cases. Moreover, it is assumed that use cases written in

English will be analysed, so NLP tool aimed at analysis of English texts will be used.

Some parts of this thesis are based on already published papers or reports. Every

chapter which is based on published material is annotated with appropriate infor-

mation and with a structured abstract showing how the chapter contributes to the

thesis.

As use cases are the main concept used in this thesis, they and their applica-

tion are described in Chapter 2. A set of good practices and patterns for authoring

use cases is also presented in that chapter. This thesis uses existing NLP methods

and tools, hence, they are described in Chapter 3. The initial research related to

this thesis was about the creation of simple methods for improving the quality of

use cases. These methods and their evaluation are presented in Chapter 4. Analysis

of the existing use-case-based requirements specification showed that the analysed

use-cases were of poor quality. The significant problem was related to the missing

events, so the idea of automatic events identification was born. However, in order to

be able to evaluate the automatic method, how IT professionals identify events had

to be analysed. When preparing an experiment with IT professionals, it appeared

that there is no benchmark use-case-based requirement specification, which could

be used to evaluate methods and tools targeted at analysis of use cases. Therefore,

this kind of benchmark specification was created. The process of its creation of it

is presented in Chapter 5. Chapter 6 contains a description of the HAZOP-based

method for manual identification of use-case events. That chapter is supplemented

1.3. Conventions 4

with empirical evaluation (with the use of the benchmark specification) of review

speed and accuracy of the method. The most significant contribution of this thesis

is presented in Chapter 7, which describes the method of automatic identification of

events in use-cases. The method is accompanied by the evaluation of the effective-

ness and efficiency of the proposed approach. Chapter 8 recapitulates the results of

the thesis, and discusses possible future improvements and extensions of the pro-

posed approaches.

1.3 Conventions

Several typographical conventions are used in the thesis. The meaning of particular

mathematical symbols and variables is explained in the place where they are used

for the first time.

new term Key concepts, terms and quotes from other authors.

KEY TERM Notes about important concepts and terms appearing in the correspond-

ing paragraph.

[12] References to books, papers, and other sources of information.

Many people contributed to the results presented in this thesis. To emphasise

this, the pronoun “we” will be used throughout the thesis. However, despite the

help and inspiration I received from other people, I am taking full responsibility for

the presented research and the results described in this thesis.

Chapter 2

Use cases and their applications

There are numerous techniques and methods for specifying functional requirements.

They include system shall statements [55], user stories [33], formal methods (like APPROACHES TO

REQUIREMENTS

SPECIFICATIONVDM [23] and Z notation [58]), goal oriented notations (e.g. i* notation [125]) and

use cases [60]. Every method has its advantages and disadvantages, and it is aimed

for different types of software systems and software development process. Regard-

less of the diversity of notations, use cases and scenarios-based approaches remain

the most popular methods (according to Neill et al. [92] scenarios and use cases are

used in over 50% of software projects).

Use cases were introduced by Ivar Jacobson in the 80’s of XX century and they

were fully described in 1992 [61] as a part of object-oriented approach for software

modelling. Use cases come in two supplementary forms: text-based and diagram-

based. The former is a part of Unified Modeling Language ™(UML) [98]. Use-case USE CASE

DIAGRAMSdiagram consists of four main parts: actor specification, use-case name (actor’s goal)

specification, relation between the actor an the use case, system boundary. Figure

2.1 presents simple use-case diagram with the mentioned elements.

Customer

Withdraw cash

ATM

1 2

3

4

Figure 2.1: An example of a use-case diagram (1 - actor’s name; 2 - an ellipse holding the name ofa use case; 3 - association between the use case and the actor; 4 - boundary of the system beingbuilt.

5

2.1. Levels of use cases 6

Use-case diagrams present only the high level information (mainly actors’ goals) TEXT-BASED

USE CASESabout functional requirements, hence, text-based use cases are needed to present

the details of the requirements. Textual use cases consist of three main parts:

• use-case name describing actor’s goal;

• main scenario presenting interaction between actors, this interaction should

lead to achieving the given goal;

• section containing alternative flows, extensions and exceptions - this section

describes other ways for achieving the goal, extensions to the main scenario

and events which may interrupt the main scenario.

An example of text-based use case is presented in Figure 2.2.

This thesis focuses on the events. An exemplary event is labeled with number

5 in Figure 2.2. It can be noticed that the main scenario describing withdraw of

cash can be interrupted by different events, one of which is incorrect PIN entered

by the customer. Events can be supplemented with the scenarios describing how

the system should handle this abnormal situation. The information about events is

not present in use-case diagrams, hence, this thesis will focus only on text-base use

cases.

It can be noticed that use cases are written in natural language. This makes use

cases easier to understand by IT layman, but makes it harder to analyse in an auto-

mated way. There are not particular syntax rules for use cases, however, some good

practices for writing use cases were proposed [9, 31]. Following these guidelines can

help in achieving high quality use cases and automatic analysis of them. Examples

of good practices for authoring use cases are presented in Section 2.5.

2.1 Levels of use cases

Functional requirements can be described at different levels of abstraction. Some LEVELS OF

USE CASEreaders of SRS can be interested in actor’s goals, other may be interested in the de-

tails of the interactions which lead to achieving the goals, other can be curious about

the low level operations the system is supposed to perform. In order to cover these

different perspectives the concept of levels of use cases was introduced. Cockburn

[31] suggests three main levels for use cases:

• summary level (also called business level) is used to describe interactions be-

tween actors which lead to achieving high level business goals. Use cases at

this level often describe business processes inside a particular company. Per-

forming scenarios from use cases at the summary level can take days, months

or even years.

2.2. Actors 7

ID: UC1 Name: Withdraw cash Actor: Customer Main scenario: 1. Customer inserts the card. 2. ATM verifies the card. 3. Customer enters the PIN number. 4. ATM verifies the entered PIN number. 5. ATM asks for the amount. 6. Customer enters the amount. 7. ATM changes the balance on customer's account. 8. ATM provides the cash and the card. 9. Customer collects the money and the card.

Alternative flows, extensions, exceptions:

4.A. The PIN number is incorrect 4.A.1. ATM informs about the problem.

4.A.2. ATM returns the card. 4.A.3. Customer collects the card. 9.A. Customer does not collect the money in 30 seconds. 9.A.1. ATM takes the money back. 9.A.2. ATM changes the balance on customer's account.

1

2

3

45

Figure 2.2: An example of a text-base use case (1 - use-case name; 2 - main scenario; 3 - sectionwith alternative flows, extensions to the main scenario and exceptions; 4 - an example of excep-tion consisting of an event and the scenario; 5 - event which may interrupt the main scenario.

• user level is used to describe how particular user (actor) goals are achieved.

Use case from Figure 2.2 is an example of a use case at the user level. Per-

forming scenarios from use cases at this level should take no more than few

minutes (one session of interaction between user and the system).

• subfunction level (also called system level) is used to describe in details opera-

tions which lead to achieving user goals.

Most of the use cases are written on the user level as this level provides most

important information about how user goals should be achieved. Therefore, this

thesis focuses on user level of use cases.

2.2 Actors

Different kind of actors can take part in a use case. Most commonly, actor represents ACTORS

an end user, who will interact with the system being built. Customer, Accountant,

Student are names of exemplary actors representing potential end users. Another

kind of actors are systems and devices, which interact with the system being built.

2.3. Information Objects 8

Usually these actors are helping in achieving users’ goals. Examples of this kind of

actors might include: banking system, hotel booking system, geolocation device.

Actor taking part in a use case can be a primary actor or a secondary actor. Pri- TYPES OF ACTORS

mary actor of a use case is the actor whose goal is being achieved in the given use

case. Secondary actors are actors which support the system and the primary actor

in achieving the goal described in the use case. Actors can act as different types of

actors in different use cases, i.e. an actor can act as a primary actor in one use case

and as secondary actor in another use case.

Additional actor taking part in use cases is the system being built, often called

also system under development (SuD).

2.3 Information Objects

Systems described in a requirements specification can operate in different business

domains, e.g. health care, banking, education, gastronomy. Every domain uses dif-

ferent terminology, types of documents and other objects holding information. In

order to understand given domain, Analyst needs to describe the domain in order

to create shared view of it [103]. This can be achieved by creating a model of a do-

main. The simplest approach to this is creating a list of information objects (also

called domain objects or business object [97]) with a name and a short description

of every of them. Information objects can hold a collection of different pieces of

information, so additionally, every object can have the data it holds specified (e.g.

information object patient data can hold information about patient’s name, health

insurance number, personal id number). If required, domain model can be mod-

eled using UML class diagrams [98], however, in this thesis only textual description

of information objects is used.

2.4 Use Cases, Actors, Information Objects and Software

Requirements Specification

One could wonder how use cases, actors and information objects are incorporated

into an SRS. Figure 2.3 presents an exemplary SRS template [66] (it is based on

the IEEE 830-1998 [55] and ISO-9126 [57] standards and suggestions from Wiegers

[121]). This template suggests clear definition of actors (in Chapter 3.1) and infor-

mation objects (in Chapter 3.2) and specification of use cases for every functional

module.

2.5. Good practices for writing use cases 9

Software Requirements Specification

1. Introduction 1.1. Purpose 1.1. Document Conventions 1.1. Product Scope 1.1. References2. Overall Description 2.1. Product Perspective 2.2. Product Features 2.3. Constraints 2.4. User Documentation 2.5. Assumptions and Dependencies3. Business Process Model 3.1. Actors and User Characteristics 3.2. Business Objects 3.3. Business Processes 3.4. Business Rules4. Functional Requirements 4.x. Functional Module X 4.x.1. Description and Priority 4.x.2. Use Cases 4.x.3. Specific Functional Requirements5. Interface Characteristics 5.1. User Interfaces 5.2. External Interfaces 5.2.1. Hardware Interfaces 5.2.2. Software Interfaces 5.2.3. Communications Interfaces6. Non-functional Requirements 7. Other RequirementsAppendix A: GlossaryAppendix B: Data Dictionary Appendix C: Analysis Models Appendix D: Issues List

Figure 2.3: An SRS template with marked chapters related to use-cases, actors and informationobjects descriptions.

2.5 Good practices for writing use cases

Use cases are written in natural language, hence, they are exposed to a number of PROBLEMS WITH

USE CASESpossible quality issues. One of them is ambiguity (e.g. lexical, syntactical, referential

or ellipsis). Every natural language has a set of homonyms, i.e. words which have

more than one meaning, so using a homonym word in a use-case step might lead

to different interpretations of the step. Moreover, the structure of a sentence might

allow different understanding of it. Similarly, incomplete sentences can lead to mis-

understandings. Another problem related to using natural language is subjectivity

of phrases, e.g. step Customer collects the money quickly leaves a broad space for


interpretations as quickly might mean different periods of time for different readers.

Additional problem is linked to the lack of precision, e.g. step Student provides the

data does not specify exactly what data is provided and this may lead to confusion.

All these (and many others) problems were observed by practitioners and led to GOOD PRACTICES

creation of guidelines and patterns for authoring use cases. Below is the list of the

good practices which are mentioned in this thesis.

2.5.1 Specify the actors

It may seem natural to move straight to writing use-cases instead of focusing of

specification of actors first. Lack of definition of some of the actors might lead to

missing some of the users goals which might result in omitting important system

behaviour. Therefore Adolph et al. [9] suggest ClearCastOfCharacters pattern saying

Identify the actors and the role each plays with respect to the system. Clearly describe

each of actors the system must interact with.

2.5.2 Use simple grammar

Cockburn [31] says that The sentence structure should be absurdly simple and it should

use grammatical structure of Subject + Verb + Object. Missing any of the parts of this

structure might make it hard to understand, while adding additional elements to

this structure might lead to misinterpretation.

2.5.3 Indicate who is the actor in a step

It should be always clear who performs the action in the step. Actor should be the

subject of the sentence. Omitting actor might lead to different interpretations, e.g.

step Send the message does not indicate which actor performs the sending operation

- is it the main actor from the use case, some secondary actor or maybe the system

being built? Writing System sends the message makes it clear which actor performs

the operation.

2.5.4 Do not use conditional statements in a main scenario

Cockburn [31] argues that using conditional statements in main scenarios does not

move the process forward and it should be avoided. Step If the password is correct,

the System presents available invoices is an example of a sentence with conditional

statement. Cockburn suggests to use validate phrase in this cases, e.g. The system

validates that the password is correct. In this case, so called happy day scenario is

presented in the main scenario and all the failures and unexpected situations can


be moved to the Exceptions section of the use case. This approach is also advised

by Adolph et al. [9] in their ScenarioPlusFragments pattern, saying Write success sto-

ryline as a simple scenario without any consideration to possible failures. Below it,

place story fragments that show what alternative condition might have occurred.

2.5.5 Do not write too short or too long scenarios

It is advised to write main scenarios with 3-9 steps. Too short main scenarios often

do not describe any valuable goals and too long scenarios make it hard to read and

understand the interaction. This good practice is supported by EverUnfoldingStory

pattern from Adolph et al. [9]. The pattern states Organize the use case set as a

hierarchical story which can be ufolded to get more detail, or folded to hide detail

and get more context. This pattern counteracts too long scenarios as it advices using

use cases at different levels instead of writing all the information in use cases at one

level.

2.5.6 Do not use technical jargon and GUI elements in steps

If Analyst has technical background it is often natural for her/him to use techni-

cal jargon and names of Graphical User Interface (GUI) elements in use-case steps.

However, Cockburn [31] reminds Verify that the step you just wrote captures the real

intent of the actor, not just the movements in manipulating the user interface.

2.5.7 Write complete list of possible events

Focusing only on main scenario, which presents the most common interaction be-

tween the actors, may lead to missing additional (not typical) system behaviour.

Therefore, Adolph et al. [9] proposed ExhaustiveAlternatives pattern. The pattern

states Capture all failure and alternative that must be handled in the use case. They

also ephasize that Error handling and exception processing comprise the bulk of most

software-based systems and Having information about the variations helps develop-

ers build a robust design.

2.5.8 All specified events should be detectable

If the specified use-case event cannot be detected by a system, the scenario which

should handle this event will never be executed, therefore, it is not necessary to

specify this scenario and this event. For example in the use case from Figure 2.2

event 9.A. could be written as Customer bent down to tie his/her shoelace. The ques-

tion arises if the ATM is able to detect this particular event? Probably no, hence it is

2.6. Applications of use cases 12

better to write the event the way it is stated in Figure 2.2: Customer does not collect

the money in 30 seconds. This event can be detected by the ATM and can be han-

dled as it is stated in the use case. This kind of issues are discussed in Detectable-

Conditions pattern from Adolph et al. [9]. The pattern states Include only detectable

conditions. Merge conditions that have the same net effect on the system.

2.6 Applications of use cases

One of the reasons why use cases have become popular is their possible applications

in other software development phases and activities. Once use cases are written

down, analysed and discussed with customer and other team members, they can be

used for:

• effort estimation - one of the main methods for effort estimation is based on

use cases. The method is called Use Case Points (UCP) and it was proposed by

Karner [69]. UCP takes into account complexity of the specified actors, com-

plexity of the use cases and a number of technical and environmental fac-

tors. Based on this use case points for the given project are calculated. These

points, together with the data about the previous projects, can be used to es-

timate the effort for developing the project. Another method for effort estima-

tion using use cases is called TTPoint and it was proposed by Ochodek [94].

This method takes into account the specification of information objects and

different types of transactions which can be found in use cases.

• prototyping - use cases can be integrated with the prototyping activities. UC Work-

bench tool described by Nawrocki et al. [88] allows for attaching User Interface

(UI) skatches to use-case steps. Based on this, a mockup can be generated for

easy analysis of the requirements and the sketched. This approach was em-

pirically validated by Olek et al. [96] with the conclusion that use cases and UI

sketches can be successfully used for prototyping of web applications.

• testing - use-case scenarios can be used as acceptance-tests scenarios. This

approach is advocated by McBreen [85] and validated empirically by Roubtsov

et al. [104]. Moreover, Olek [95] proposed an approach which allows for auto-

matic generation of acceptance tests from use cases.

This diversity of applications of use cases shows that use cases can play major

role in software development, hence, it is important to assure high quality of use

cases. This can be done manually using reviews and the mentioned good practices,

and automatically with appropriate tools. Both approaches, manual and automatic

for improving quality of use cases are presented in this thesis.

Chapter 3

Basic NLP methods and tools

Survey conducted by Mich et al. [81] shows that natural language is used in major-

ity of requirements specifications. Documents based on use cases are not different,

which means that they are often ambiguous and vague and this can result in mis-

understandings and communication problems between stakeholders.

Participants of the mentioned survey were asked What would be the most use-

ful thing to improve general day-to-day efficiency?. Most of them (64 %) answered

with automation. This shows that there is a strong need for tools which are able

to analyse and process requirements in an automatic way. A number of tools [42,

49, 77, 91, 126] were proposed for automated processing of requirements, however,

most of them are designed for formal or language-restricted methods for require-

ments modeling. Taking into account that majority of requirements are written in

natural language, there is a demand of tools which can process natural-language-

based requirements. This can be done, to some extend, using existing Natural Lan-

guage Processing (NLP) tools. This thesis uses two NLP tools: Stanford Parser [5]

and OpenNLP [1] library, which provide basic operations helping in analysis of nat-

ural language sentences. The reminder of this chapter presents the most important

parts of NLP tools required for effective and efficient processing of use-case steps.

3.1 Sentence segmentation

It is used for distinguishing sentences from larger texts. In most written natural lan-

guages sentences are separated with punctuation marks, however, some of these

punctuation marks can be used also for other purposed, e.g. a dot can be used

in abbreviations like Dr. or i.e.. There are two main approaches for the problems

of sentence segmentation: traditional regular-expression-based (e.g. Alembic infor-

13

3.2. Word tagging 14

mation extraction system [8]) or trainable one (e.g. algorithm using regression trees

by Breiman et al. [24]). Sentence detector from OpenNLP used in this thesis uses

the second approach. It may seem that sentence segmentation is not necessary for

analysis of use-case step as the steps consist of one sentence. In fact, there is no for-

mal requirement for the steps to contain only one sentence and Analyst can write as

many sentences in one step as he/she wants. Sentence segmentation should work

as follows: for the input

System filters the data according to the given criteria. System saves the filtered data.

System displays confirmation.

it should return the following output:

• Sentence 1: System filters the data according to the given criteria.

• Sentence 2: System saves the filtered data.

• Sentence 3: System displays confirmation.

3.2 Word tagging

This phase is used for distinguishing separate words from a sentence (use-case step).

In case of space-delimited languages, like English, words are separated with white

spaces, which makes word tagging simpler than for unsegmented languages (like

Chinese), however, there are still challenges to overcome. One of the challenges is

related to punctuation marks, like periods, commas or apostrophes. These elements

can have different meanings and can change function of a word. Another challenge

is linked to the multi-part words and multi-word expressions. These constructions

are often used in strongly agglutinative languages (like Turkish or Swahili) but they

can also be found in German (in noun-noun or adverb-noun constructions) and

even in English (usually with use of a hyphen, e.g. Poznan-based). A common ap-

proach for word tagging is called maximum matching algorithm which uses a list

of words for the language being analysed. Based on the list the algorithm is able to

find separate words in the given text. Stanford Tokenizer [4] was used in this thesis

for word tagging. Word tagging should work as follows: for the input


it should return the following list of tagged words

• Word 1: System

• Word 2: displays

• Word 3: confirmation

3.3. Part-of-speech tagging 15

3.3 Part-of-speech tagging

The task of part-of-speech (POS) tagger is to assign part of speech (e.g. noun, verb,

adjective) to every word in a sentence. The problem with part-of-speech tagging is

that different words can be a different parts of speech, depending on the context of

the sentence. For example word displays in the following sentences can be a verb or

a noun:

• verb example: System displays confirmation.

• noun example: The confirmation in presented on displays.

According to DeRose report [40], over 40% of English word types are ambiguous. The

task of part-of-speech tagging is to resolve this ambiguities, assigning the proper

part-of-speech to the given word. There are two main approaches to part-of-speech

tagging: rule-based and stochastic [64]. The former requires a large set of hand-

written disambiguation rules, while the latter use training corpus to compute prob-

ability of a given word being given part-of-speech. In this thesis Stanford Part-Of-

Speech Tagger [3] is used, which uses the probabilistic approach with maximum

entropy models [20] in order to assign parts-of-speech to words. The POS tagger

should work as follows: for the input


it should return the following list of parts-of-speech:

• System: noun

• displays: verb

• confirmation : noun

3.4 Lemmatization

The goal of this phase is to find a common base form (called lemma) of the given

word. Jurafsky et al. [64] defines lemma as:

“a set of lexical form having the same stem, the same major part-of-

speech, and the same word-sense.”

This task is challenging because of many numbers of possible forms of a given word

(e.g. book, books, books’ book’s have the same lemma book and am, are, is have

the same lemma be). The lemmatizer from the morphology component of Stanford

Parser [5] is used in this thesis for the task of Lemmatization. The lemmatizer should

work as follows: for the input

System displays user’s data.

3.5. Parsing 16

it should return the following output

• for the word System the lemma is system

• for the word displays the lemma is display

• for the word user’s the lemma is user

• for the word data the lemma is data

3.5 Parsing

A general definition of parsing provided by Jurafsky et al. [64] is following:

“Parsing means taking an input and producing some sort of structure

for it.”

In case of natural-language-parsing, a sentence is an input and a parse tree is the

output. Parsing natural language is much more challenging than parsing any for-

mal language because usually several parse trees are possible for the same sen-

tence. There are two main strategies for parsing: top-down (also called goal-directed

search) and bottom-up (called data-directed search). Stanford Parser [72] uses prob-

abilistic methods for creation of parse trees based on the CYK algorithm (using bottom-

up approach) [64]. In order to use this approach Probabilistic Context Free Grammar

is assumed (PCFG). The output from Stanford Parser for the sentence

System filters the data according to the given criteria.

is presented in Figure 3.1.

Parse tree:

(ROOT (S (NP (NNP System)) (VP (VBZ filters) (NP (DT the) (NNS data)) (PP (VBG according) (PP (TO to) (NP (DT the) (VBN given) (NNS criteria))))) (. .)))

Figure 3.1: An example of parse tree generated by Stanford parser generated by Stanford Parserfor the sentence System filters the data according to the given criteria.

3.6. Dependency relations tagging 17

3.6 Dependency relations tagging

This phase provides a representation of dependency relations between words in a

given sentence. These relations allow for distinguishing main parts of the sentence,

like subject, predicate and object. The types of relations used in Stanford Parser are

described by Marnee et al. [36]. The grammatical relations tagger from Stanford

Parser [37] works as follows: for the input

System filters the data according to the given criteria.

it returns the grammatical relations as presented in Figure 3.2. From these relations

and taking into account parts of speech of every word and the parse tree, the follow-

ing main elements can be distinguished from the given example:

• subject: system

• predicate: filter

• object: data

Typed dependencies:

nsubj(filters-2, System-1)root(ROOT-0, filters-2)det(data-4, the-3)dobj(filters-2, data-4)prepc_according_to(filters-2, to-6)det(criteria-9, the-7)amod(criteria-9, given-8)pobj(filters-2, criteria-9)

Figure 3.2: An example of grammatical relations generated by Stanford Parser for the sentenceSystem filters the data according to the given criteria.

Chapter 4

Simple methods of quality

assurance for use cases

Preface

This chapter contains the paper: Alicja Ciemniewska, Jakub Jurkiewicz, Jerzy

Nawrocki and Łukasz Olek, Supporting Use-Case Reviews, 10th International Confer-

ence on Business Information Systems, Lectures Notes in Computer Science, Vol 4439,

(2007).

Deeper analysis of the results of the proposed methods are presented in Chapter

5 in Section 5.6.

My contribution to this paper included the following tasks: (1) analysis of avail-

able NLP tools; (2) development and implementation of the methods which were

able to detect 5 out of 9 bad smells 1 ; (3) evaluation of the proposed methods with 2

real-life projects; (4) implementation of NLP parsers with the use of Stanford Parser

and OpenNLP tools.

Context: The research was inspired by the book Patterns for Effective Use Cases by

Adolph et al. , which contained some recommendations for writing effective use

cases. We considered violations of those recommendations as bad smells in use

cases and started to think which of them could be detected in an automatic way.

That research was my first step on the way to applying NLP techniques for improve-

ment of the quality of use cases which later allowed the stating and solving of prob-

lems presented in Chapters 6 and 7.

Objective: The goal of this Chapter is to propose NLP-based methods aimed at au-

1Duplicated use cases, Complicated Extensions, Repeated actions in neighbouring steps, Lack ofthe Actor, Misusing Tenses and Verb Forms

18

4.1. Introduction 19

tomatic detection of bad smells is use cases.

Method: First, the patterns proposed by Adolph et al. in their book were analysed

to decide for which of them violations to the patterns, called bad smells, can be

detected in an automatic way. Next, for each of the chosen bad smells an auto-

matic method of its identification was proposed and implemented with the help

of NLP tools (Stanford Parser, OpenNLP). The usefulness of the proposed approach

was evaluated and described in another paper, see Chapter 5.

Results: We managed to propose methods which were able to automatically identify

the following bad smells in use cases:

• Duplicated use cases

• Lack of the Actor

• Too long or too short use cases.

• Inconsistent style of naming

• Too Complex Sentence Structure

• Misusing tenses and verb forms

• Using Technical Jargon

• Conditional Steps

• Complicated Extensions

• Repeated actions in neighbouring steps.

Conclusion: Knight et al. in their paper “An improved inspection technique.” (Com-

munications of the ACM, 1993) proposed phased reviews, where initial reviews are

quite simple (similar to identifying easy-to-discover bad smells) and can be done by

a secretary or a junior developer. The methods proposed in the chapter show that

such reviews, when applied to use cases, could be done by a computer instead of a

human, and could be a useful add-on to a use-case editor.

4.1 Introduction

Use cases have been invented by Ivar Jacobson [60]. They are used to describe func-

tional requirements of information systems in a natural language ([9], [31], [61],

[34]). They are getting more and more popular. They are extensively used in var-

ious software development methodologies, including Rational Unified Process [74]

and XPrince [90].

Quality of software requirements, described as use cases or in any other form,

is very important. The later the defect is detected, the more money it will cost. Ac-

cording to Pressman [99], correcting a defect in requirements at the time of coding

costs 10 times more than correcting the same defect immediately, i.e. at the time

4.2. Use Cases and Bad Smells 20

of requirements specification. Thus, one needs quality assurance. As requirements

cannot be tested, the only method is requirements review. During review a soft-

ware requirements document is presented to interested parties (including users and

members of the project team) for comment or approval [54]. The defects detected

during review can be minor (not so important and usually easy to detect) or ma-

jor (important but usually difficult to find). It has been noticed that if a document

contains many easy-to-detect defects then review is less effective in detecting ma-

jor defects. Thus, some authors proposed to split reviews into two stages (Knight

calls them "phases" [73], Adolph uses term "tier" [9]): the first stage would concen-

trate on detecting easy-to-detect defects (e.g. compliance of a document with the

required format, spelling, grammar etc.) and the aim of the second stage would be

to find major defects. The first stage could be performed by a junior engineer or

even by a secretary, while the second stage would require experts.

In the paper a mechanism for supporting use-case reviews, based on natural

language processing tools, is presented. Its aim is to find easy-to-detect defects au-

tomatically, including use-case duplication in the document, inconsistent style of

naming use cases, too complex sentence structure in use cases etc. That will save

time and effort required to perform this task by a human. Moreover, when integrat-

ing this mechanism with a requirements management tool, such as UC Workbench

developed at the Poznan University of Technology ([87], [89]), one can get instant

feedback on simple defects. That can help to learn good practices in requirements

engineering and it can help to obtain a more ’homogeneous’ document (that is im-

portant when requirements are collected by many analysts in a short time).

The next section describes two main concepts used in the paper: use case and

bad smell (a bad smell is a probable defect). Capabilities of natural language pro-

cessing tools are presented in Section 3. Section 4 describes defects in use cases that

can be detected automatically. There are three categories of such defects: concern-

ing the whole document (e.g. use-case duplication), injected into a single use case

(e.g. an actor appearing in a use case is not described), concerning a single step in a

use case (e.g. too complex structure of a sentence describing the step). A case study

showing results of application of the proposed method to use cases written by 4th

year students is presented in Section 5. The last section contains conclusions.

4.2 Use Cases and Bad Smells

Use cases are getting more and more popular way of describing functional require- BAD SMELLS

ments ([9], [31], [62]). In this approach, scenarios pointing up interaction between

users and the system are presented using a natural language. Use cases can be writ-

4.3. Natural Language Processing Tools for English 21

Figure 4.1: An example of a use case in a structured form.

ten in various forms. The most popular are ’structured’ use cases. An example of

structured use case is presented in Figure 4.1. It consists of the main scenario and

a number of extensions. The main scenario is a sequence of steps. Each step de-

scribes an action performed by a user or by a system. Each extension contains an

event that can appear in a given step (for instance, event 3.A can happen in Step 3

of the main scenario), and it presents an alternative sequence of steps (actions) that

are performed when the event appears.

The term ’bad smell’ was introduced by Kent Beck and it was related to source

code ([44]). A bad smell is a surface indication that usually corresponds to a deeper

problem in the code. Detection of a bad smell does not have to mean, that the code

is incorrect; it should be rather consider as a symptom of low readability which may

lead to a serious problem in the future.

In this paper, a bad smell is a probable defect in requirements. Since we are

talking about requirements written in a natural language, in many situations it is

difficult (or even impossible) to say definitely if a suspected defect is present or not

in the document. For instance, it is very difficult to say if a given use case is a dupli-

cation of another use case, especially if the two use cases have been written by two

people (such use cases could contain unimportant differences). Another example

is complexity of sentence structure. If a bad smell is detected, a system displays a

warning and the final decision is up to a human.

4.3 Natural Language Processing Tools for English

Among many other natural language processing tools, the Stanford parser [10] is NLP TOOLS

the most powerful and useful tool from our point of view. The parser has a lot of

4.3. Natural Language Processing Tools for English 22

Figure 4.2: An example of a Combined Structure generated by the Stanford parser.

capabilities and generates three lexical structures:

• probabilistic context free grammar (PCFG) structure - is a context-free gram-

mar in which each production is augmented with a probability

• dependency structure - represents dependencies between words

• combined structure - is a lexicalized phrase-structure, which carries both cate-

gory and (part-of-speech tagged) head word information at each node (Figure

4.2)

The combined structure is the most informative and useful. To tag words it uses

Penn Treebank set. As an example in Figure 4.2 we present structure of a sentence

"User enters input data". In the example the following notation is used: S - sentence,

NP - noun phrase , VP - verb phrase, NN - noun, NNP - singular proper noun, NNS

- plural common non, VBZ - verb in third person singular.

Moreover, the Stanford parser generates grammatical structures that represents

relations between individual pairs of words. Below in Figure 4.3 we present typed

dependencies for the same example sentence, as in the above example. Notation

used to describe grammatical relations are presented in [38].

The ambiguity of natural language and probabilistic approach used by the Stan-

ford parser cause problems with automatic language analysis. During our research

we have encountered the following problem. One of the features of the English lan-

guage is that there are some words which can be both verbs and nouns. Addition-

ally the third person singular form is composed by adding "s" to the end of the verb

base form. In a similar way the plural form of a noun is built. This leads to the situ-

ation when the verb is confused with the noun. For example word "displays" can be

tagged as a verb or a noun. Such confusion may have great influence on the further

analysis. An example of such situation is presented in Figure 4.4.

4.4. Defects in Use Cases and their Detection 23

Figure 4.3: An example of a typed dependencies generated by the Stanford parser.

Figure 4.4: An example of an incorrect sentence decomposition.

In our approach we want to give some suggestions to the analyst about how

the quality of use cases can be improved. When the analyst gets the information

about the bad smells, he can decide, whether these suggestions are reasonable or

not. However, this problem does not seem to be crucial. In our research, which

involved 40 use cases, we have found only three sentences in which this problem

occurred.

4.4 Defects in Use Cases and their Detection

Adolph [9] and Cockburn [31] presented a set of guidelines and good practices about

how to write effective use cases. In this context "effective" means clear, cohesive,

easy to understand and maintain. Reading the guidelines one can distinguish sev-

eral types of defects in use cases. In this Section we present those defects that can be

automatically detected. Each defect discussed in the paper contains a description

of automatic detection method. They have been split into three groups presented

in separate subsections: specification-level bad smells (those are defect indicators


concerning a set of use cases), use-case level bad smells (defect indicators concern-

ing a single use case), and step-level bad smells (they concern a step - a use cases

consists of many steps).

4.4.1 Specification-Level Bad Smells

At the level of requirements specification, where there are many use cases, a quite

common defect which we have observed is use-case duplication. Surprisingly, we

have found such defects even in specifications prepared by quite established Polish

software houses. The most frequent is duplication by information object. If there

are two different information objects (e.g. an invoice and a bill), analysts have a ten-

dency to describe the processes which manipulate them as separate use cases, even

if they are processed in the same way. That leads to unnecessary thick documenta-

tion (the thicker the document, the longer the time necessary to read it). Moreover,

it is dangerous. When someone finds and fixes a defect in one of the use cases, the

other will remain unchanged, what can be a source of problems in the future. There

are two sources of duplicated use cases:

• Intentional duplication. An analyst prefers that style and/or he wants to have

a thick document (many customers still prefer thick documents - for them

they look more serious and dependable, which is of course a myth). Some of

such analysts perhaps will ignore that kind of warning, but some other - more

proactive - may start to change their style of writing specification.

• Unintentional duplication. There are several analysts, each of them is writ-

ing his own part of the specification and before the review process no one is

aware of the duplications. If this is the case, the ability to find duplicates in an

automatic way will be perceived as very attractive.

Detection method is two-phased. In the first stage a signature (finger print) of

each use case is computed. It can be a combination of a main actor identifier (e.g.

its number) and a number of steps a use case contains. Usually a number of steps

in a use case and a number of actors in the specification are rather small (far less

than 256), thus a signature can be a number of the integer type. If two use cases

have the same signature, they go through the second stage of analysis during which

they are examined step by step. Step number j in one use case is compared against

step number j in the second use case and so-called step similarity factor, s, is com-

puted. Those similarity factors are combined into use-case similarity factor, u. If u is

greater than a threshold then two use cases are considered similar and a duplication

warning is generated.


A very simple implementation of the above strategy can be the following. We

know that two similar use cases can differ in an information object (a noun). More-

over, we assume that most important information about processing an information

object is contained in verbs and nouns. Thus, step similarity factor, si, for steps

number i in the two compared use cases can be computed in the following way:

• If all the corresponding verbs and nouns appearing in the two compared steps

are the same, then si = 1.

• If all the corresponding verbs are the same and all but one corresponding nouns

are the same and a difference in the two corresponding nouns has been ob-

served for the first time, then si = 1 and InfObject1 is set to one of the "con-

flicting" nouns and InfObject2 to the other (InfObject1 describes an informa-

tion object manipulated with the first use case and InfObject2 is manipulated

with the second use case).

• If all the corresponding verbs are the same and all but one corresponding nouns

are the same and the conflicting nouns are InfObject1 in the first use case and

InfObject2 in the second, then si = 1.

• In all other cases, si for the two analyzed steps is 0.

Use-case similarity factor, u, can be computed as a product of step similarity

factors: s1 ∗ s2...∗ sn .

The described detection method is oriented towards intentional duplication. To

make it effective in the case of unintentional duplication one would need a dictio-

nary of synonyms. Unfortunately, so far we do not have any.

4.4.2 Use-Case Level Bad Smells

Bad smells presented in this section are connected with the structure of a single use

case. The following bad smells have been selected to detect them automatically with

UC Workbench, a use-case management tool developed at the Poznan University of

Technology:

• Too long or too short use cases. It is strongly recommended [9] to keep use

cases 3-9 steps long. Too long use cases are difficult to read and understand.

Too short use cases, consisting of one or two steps, distract a reader from the

context and, as well, make the specification more difficult to understand. To

detect that bad smell it is enough to count the number of steps in each use

case.

• Complicated extension. An extension is designed to be used when an alterna-

tive course of action interrupts the main scenario. Such an exception usually

can be handed by a simple action and then it can come back to the main sce-

nario or finish the use case. When the interruption causes the execution of


Figure 4.5: Example of a use case with too complicated extension.

Wrong:1. Administrator fills in his user name2. Administrator fills in his telephone number3. Administrator fills in his email

Correct:Administrator fills in his user name, telephone number and email

Figure 4.6: Example of repeated actions in neighboring steps

a repeatable, consistent sequence of steps, then this sequence should be ex-

tracted to a separate use case (Figure 4.5). Detection can be based on counting

steps within each extension. A warning is generated for each extension with

too many steps.

• Repeated actions in neighboring steps.

• Inconsistent style of naming

The last two bad smells will be described in the subsequent subsections.

Repeated Actions in Neighboring Steps

Every step of a use case should represent one particular action. The action may

consist of one or more moves which can be taken as an integrity. Every step should

contain significant information which rather reflect user intent then a single move.

Splitting these movements into separate steps may lead to long use cases, bother-

some to read and hard to maintain.

Detection method: Check whether several consecutive steps have the same ac-

tor (subject) and action (predicate). Extraction of subject and predicate from the

sentence is done by the Stanford parser. The analyst can be informed that such se-

quence of steps can be combined to a single step.


Wrong: Title: Main Use CaseCorrect: Title: Buy a book

Figure 4.7: Example of inconsistent style of naming

Inconsistent Style of Naming

Every use case should have a descriptive name. The title of each use case presents a

goal that primary actor wants to achieve. There is a few conventions of naming use

cases, but it is preferable to use active verb phrase in the use case name. Further-

more, chosen convention should be used consistently in all use cases.

Detection method: Approximated method of bad smell detection in use case

names, is to check whether use case name satisfies the following constraints:

• The title contains a verb in infinitive (base) form

• The title contains an object

This can be done using the Stanford parser. If the title does not fulfill these con-

straints, a warning is attached to the name.

4.4.3 Step-Level Bad Smells

Bad smells presented in this section are connected with use-case steps. Steps occur

not only in the main scenario, but also in extensions. The following bad smells are

described here:

Too Complex Sentence Structure

The structure of a sentence used for describing each step of use case should be as

simple as possible. It means that it should generally consists of a subject, a verb, an

object and a prepositional phrase ([9]). With such a simple structure one can be sure

that the step is grammatically correct and unambiguous. Because of the simplicity,

use cases are easy to read and understand by readers. Such a simple structure helps

a lot when using natural language tools. It is essential to notice that the simpler the

sentence is, the more precisely other bad smells can be discovered. An example of a

too complex sentence and its corrected version is presented below.

Detection method: Looking at the use cases that were available to us we have

decided to build a simple heuristic that checks whether a step contains more than

one sentence, subject or predicate, coordinate clause, or subordinate clause. If the

answer is YES, then a warning is generated. Obviously, it is just a heuristic. It can

happen that a step contains one of the mentioned defect indicator, but it is still


Wrong: The system switches to on-line mode and displays summary informationabout data that have to be uploaded and downloaded

Correct: The system displays list of user’s tasks

Figure 4.8: Examples of too complex sentence structure

Wrong: The form is filled inCorrect: Student fills in the form

Figure 4.9: Example of lack of the actor

readable and easy to understand. Providing a clear and correct answer in all possible

cases is impossible - even two humans can differ in their opinion what is simple

and readable. In our research we have encountered some examples of this problem.

However, we have distinguished some rules, which can be used to verify, whether

a sentence is too complex and unreadable. Below we present the verification rules.

The numbers in the brackets show applicability of a rule (in how many use cases

a given rule could be applied) and its effectiveness (in how many use cases it has

found a real defect approved by a human).

• step contains more than one sentence (2 / 2)

• step contains more than one subject or predicate (2 / 2)

• step contains more than one coordinate clause (8 / 4)

• step contains more than one subordinate clause (7 / 5)

Lack of the Actor

According to [9] it should be always clearly specified who performs the action. The

reader should know which step is performed by which actor. Thus, every step in a

use case should be an action that is performed by one particular actor. Actor’s name

ought to be the subject of the sentence. Thus, in most cases the name should appear

as the first or second word in the sentence. Omission of actor’s name from the step

may lead to a situation in which the reader does not know who is the executor of the

action.

Detection method: In the case of UC Workbench, every actor must be defined

and each definition is kept within the system. Therefore it can be easily verified

whether the subject of a sentence describing a step is defined as an actor.

Misusing Tenses and Verb Forms

A frequent mistake is to write use cases from the system point of view. This is easier

for the analyst to write in such a manner, but the customer may be confused during


Wrong: Send the messageWrong: System is sending an emaiWrong: The email is sent by SystemCorrect: System sends an email

Figure 4.10: Example of misusing tenses and verb forms

Wrong: User chooses the second tab and marks the checkboxesCorrect: User chooses appropriate options

Figure 4.11: Example of using technical jargon

reading. Use cases should be written in a way which is highly readable for everyone.

Therefore the action ought to be described from the user point of view. In order to

ensure this approach, the present simple tense and active form of a verb should be

used. Moreover, the present simple tense imply that described action is constant

and the system should always respond in a determined way.

Detection method: Using combined structure from the Stanford parser, the nsubj

relation [38] can be determined. It can be checked if the subject is an actor and the

verb is in the third person singular active form.

Using Technical Jargon

Use case is a way of describing essential system behavior, so it should focus on the

functional requirements. Technical details should be kept outside of the functional

requirements specification. Using technical terminology might guide developers to-

wards specific design decisions. Graphical user interface (GUI) details clutter the

story, make reading more difficult and the requirements more brittle.

Detection method: We should create a dictionary containing terminology typi-

cal for specific technologies and user interface (e.g. button, web page, database, edit

box). Then it is easy to check whether the step description contains them or not.

Conditional Steps

A sequence of actions in the use case depends on some conditions. It is natural to

describe such situation using conditionals "if condition then action ...". This ap-

proach is preferred by computer scientists, but it can confuse the customer. Espe-

cially it can be difficult to read when nested "if" statement is used in a use case

step. Use cases should be as readable as possible. Such a style of writing makes it

complex, hard to understand and follow.

4.5. Case Study 30

Wrong:1. Administrator types in his user name and password2. System checks if the user name and the password are correct3. If the given data is correct the system logs Administrator in

Correct:1. Administrator types in his user name and password2. System finds that the typed-in data are correct and logs Administrator in

Extensions:2.A. The typed-in data are incorrect2.A.1. System presents an error message

Figure 4.12: Example of conditional steps

It is preferable to use the optimistic scenario first (main scenario) and to write

alternative paths separately in the extension section.

Detection method: The easiest way to detect this bad smell is to look for specific

keywords, such as if, whether, when. Additionally, using the Stanford parser it can

be checked that the found keyword is a subordinating conjunction. In such a case

the analyst can be notified that the step should be corrected.

4.5 Case Study

In this section we would like to present an example of applying our method to a set

of real use cases. These use cases were written by 4th year students participating in

the Software Development Studio (SDS) which is a part of the Master Degree Pro-

gram in Software Engineering at the Poznan University of Technology. SDS gives the

students a chance to apply their knowledge to real-life projects during which they

develop software for real customers.

Let us consider a use case presented in Figure 4.13. Using the methods presented

in Section 4 the following bad smells can be detected:

Misusing Tenses and Verb Forms

• Step: 3. User fill the form

• Tagging (from the Stanford parser): User/NNP fill/VBP the/DT form/NN

NNP - singular proper noun

VBP - base form of auxiliary verb

DT - determiner

NN - singular common noun

• Typed dependencies (from the Stanford parser):

nsubj(fill-2, User-1) - nominal subject

4.5. Case Study 31

Figure 4.13: A use case describing how to log in to the TMS Mobile system.

det(form-4, the-3) - determiner

dobj(fill-2, form-4) - direct object

• Conclusion:

From the typed dependencies we can determine the nsubj relation between

User and fill. From the tagging, it can be observed that fill is used in wrong

form (proper form would be VBZ - verb in third person singular form).

Conditional Step

• Step: 4. The system checks if user entered proper data and loges him in

• Tagging (from the Stanford parser): The/DT system/NN checks/VBZ if /IN user/NN

entered/VBD proper/JJ data/NNS and/CC loges/VBZ him/PRP in/IN

VBZ - verb in third person singular form

IN - subordinating conjunction

VBD - verb in past tense

JJ - adjective

NNS - plural common noun

PRP - personal pronoun

• Combined structure (from the Stanford parser): Presented in Figure 4.14

• Conclusion:

As it can be observed the step contains the word if. Moreover from the com-

bined structure we can conclude that the word if is subordinating conjunc-

tion.

Complicated Extensions

• Extension: 4.b User enters login data for the first time

• Symptom: The extension contains three steps.

• Conclusion: The extension scenario should be extracted to a separate use

case.

4.6. Conclusions 32

Figure 4.14: Example of a use case that contains a condition.

4.6 Conclusions

So far about 40 use cases have been examined using our methods of detecting bad

smells. Almost every use case from the examined set, contained a bad smell. Most

common bad smells were: Conditional Step, Misusing Tenses and Verb Forms and

Lack of the Actor. Thus, this type of research can contribute to higher quality of

requirements specification.

In the future it is planned to extend the presented approach to other languages,

especially to Polish which is mother tongue to the authors. Unfortunately, Polish is

much more difficult for automated processing and there is lack of appropriate tools

for advanced analysis.

Acknowledgements

First of all we would like to thank the people involved in the UC Workbench project.

This work has been financially supported by the State Committee for Scientific

Research as a research grant 91-0269 (years 2006-2008).

Chapter 5

Use-case-based benchmark

specification

Preface

This chapter contains the paper: Bartosz Alchimowicz, Jakub Jurkiewicz, Mirosław

Ochodek, Jerzy Nawrocki, Building Benchmarks for Use Cases, Computing and Infor-

matics, Vol 29, No 1 (2010), pp. 27-44.

My contribution to this chapter included the following tasks: (1) full qualitative

analysis of a set of 524 real-life use cases; (2) part of the quantitative analysis of the

set of use cases; (3) introducing changes to the benchmark specification according

to the results of the qualitative analysis; (4) performing a case study with the tool for

automatic detection of bad smells described in the previous chapter.

Context: Many concepts in requirements engineering require empirical evaluation.

Such an evaluation should satisfy the following requirements: (1) evaluation exper-

iments should be replicable, and (2) they should be based on representative data

or artifacts, coming from real-life projects. In the context of use cases the prob-

lem is that real-life software specifications are of commercial value and usually they

cannot be published, which makes such an experiment non-replicable at another

evaluation site.

Objective: The goal of the research presented in this chapter was to create a bench-

mark software specification that could be used in replicable evaluation experiments

concerning methods and tools oriented toward use cases.

Method: To create the benchmark specification, 524 use cases from 16 industrial

and academic projects were analysed. The analysis was twofold: (1) quantitative

analysis focused on measuring simple attributes of use cases (e.g. number of steps

33


in main scenarios, steps with validation actions); (2) qualitative analysis focused on

measuring the frequency of bad smells in the available use cases.

Results: Based on the results of the quantitative and qualitative analysis, a profile

of a typical use-case-based specification was created. Moreover, using this profile,

two specifications resembling the typical specification were composed: (1) a clean

specification without bad smells; (2) a specification contaminated with bad smells.

Conclusion: The proposed benchmark specification can be used in experiments

concerning methods and tools for use case analysis. It has been used to evaluate

simple methods of quality assurance for use cases (presented in Chapter 4), and to

study the effectiveness of events identification performed by a human (Chapter 6) or

with the help of an automatic tool (Chapter 7). Moreover, it has already been used

by other researchers (e.g. by Rehan Rauf et al. from University of Waterloo in their

work presented at the IEEE International Conference on Requirements Engineering

RE-2011).

5.1 Introduction

Functional requirements are very important in software development. They im-

pact not only the product itself but also test cases, cost estimates, delivery date,

and user manual. One of the forms of functional requirements are use cases in-

troduced by Ivar Jacobson about 20 years ago. They are getting more and more pop-

ular in software industry and they are also a subject of intensive research (see e.g.

[18, 21, 39, 41, 110]). Ideas presented by researches must be empirically verified (in

best case, using industrial data) and it should be possible to replicate a given exper-

iment by any other researcher [108]. In case of use cases research it means that one

should be able to publish not only his/her results but also the functional require-

ments (use cases) that has been used during the experiment. Unfortunately, that is

very seldom done because it is very difficult. If a given requirements specification

has a real commercial value, it is hard to convince its owner (company) to publish it.

Thus, to make experiments concerning use cases replicable, one needs a benchmark

that would represent use cases used in real software projects.

In the paper an approach to construct use-cases benchmark is presented. The

benchmark itself is a set of use-cases-based requirements specifications, which have

a typical profile observed in requirements coming from real projects. It includes

both quantitative and qualitative aspects of use cases. To derive such a profile an

extensive analysis of 524 use cases was performed.

The paper is organised as follows. In Section 5.2 a model of use-cases-based re-

5.2. Benchmark-oriented model of use-case-based specification 35

quirements specification is presented. This model is further used in sections 5.3 and

5.4, which present the analysis of the use cases, coming from 16 projects. Based on

the analysis quantitative and qualitative profile of the typical use-case-based spec-

ification is derived, which is used to create a benchmark specifications in Section

5.5. Finally, a case study is presented in Section 5.6, which demonstrates a typical

usage of the benchmark specification to compare tools for the use-case analysis.

5.2 Benchmark-oriented model of use-case-based

specification

Although use cases have been successfully used in many industrial projects (Neil et

al. [92] reported that 50% of projects have their functional requirements presented

in that form), they have never been a subject of any recognisable standardisation.

Moreover, since their introduction by Ivar Jacobson [61] many approaches for their

development were proposed. For example Russell R. Hurlbut [53], gathered fifty-

seven contributions concerning use cases modeling. Although, all approaches share

the same idea of presenting actor’s interaction with the system in order to obtain

his goal, they vary in a level of formalism and presentation form. Thus it is very

important to state what is understood by the term use-cases-based requirements

specification.

To mitigate this problem a semi-formal model of use-cases-based requirements

specification has been proposed and presented in Figure 5.1. It incorporates most

of the best-practices presented by Adolph et al. [9] and Cockburn [31].

However it still allows to create specification that contains only scenario (or non-

structured story) and actors, which is enough to compose small but valuable use-

case. This specification can be further extended with other elements like extensions

(using steps or stories), pre- and post-conditions, notes or triggers.

Unfortunately, understanding the structure of specification is still not enough

in order to construct a typical use-cases-based specification. What is missing, is

the quantifiable data concerning the number of model-elements and proportions

between them1. In other words, one meta-question has to be asked for each element

of the model - "how many?".

Another aspect is a content of use cases. In this case what is required is a knowl-

edge about the language structures that people use to author use cases. This in-

cludes also a list of typically-made mistakes.

1A number/proportion of any element of the model will be further addressed as a property of thespecification

5.3. Analysis of uc-based specifications structure 36

Requirements Specification

+ Name+ Description

Business Object

+ Name+ Description

Actor

Referenced Element

Scenario

+ TextStep

+ TextStory

+ Event_TextExtension

+ TitleUse Case

Goal Level

+ DescriptionConditon

+ DescriptionTrigger

Business User Sub-function

*

1

*

1

*

1

* *

*1

1..*

1

*1

0..11

1

1

1

*

*

*

*

*+sub-scenario

+pre-conditions

+main-scenario+post-conditions

+main-actors+secondary-actors

Business Rule *

1*

*

*

1..*

*

1

0..1

1

Figure 5.1: Use-Cases-based functional requirements specification model

5.3 Analysis of uc-based specifications structure

In order to discover the profile of a typical specification, data from various projects

was collected and analysed. As a result a database called UCDB (Use Cases Database)

[6] has been created. At this stage it stores data from 16 projects with a total number

of 524 use cases (basic characteristics of requirements specifications as well as short

description of projects are presented in Table 5.1).

Specifications have been analysed in order to populate the model with the in-

formation concerning average number of its elements and additional qualitative as-

pects of use cases (see Section 5.4).

One of the most interesting findings is that 65.8% of the main scenarios in use

cases consist of 3-9 steps, which means that they fulfil the guidelines proposed by

Cockburn [31]. What is more 72.1% of the analysed use cases have alternative sce-

narios (extensions). There are projects in which use cases have extensions attached

to all steps in main scenarios, however on-average use case contains between 1 to

2 extensions (mean 1.57). Detailed information regarding the number and distribu-

tions of steps in main scenarios and extensions, is presented in Figure 5.2.

Another interesting observation, concerning the structure of use cases, is that


Table 5.1: Overview of the projects and their requirements specifications (origin: industry -project developed by software development company, s2b - project developed by students for ex-ternal organisation, external - specification obtained from the external source which is freely ac-cessible through the Internet; projects D and K come from the same organisation)

ID

All

Business

Project A English S2B 17 0% 76% 24%

Project B English S2B 37 19% 46% 35%

Project C English 39 18% 44% 33%

Project D 77 0% 96% 4%

Project E S2B 41 0% 100% 0%

Project F 10 0% 100% 0%

Project G English 90 0% 81% 19%

Project H 16 19% 56% 25%

Project I 21 38% 57% 5% Banking system

Project J 9 0% 67% 33%

Project K 75 0% 97% 3%

Project L English 16 0% 31% 69%

Project M English 26 0% 23% 77%

Project M English 18 0% 0% 100%

Project O English 16 0% 31% 69%

Project P English 16 0% 25% 75%

Specification

language

Origin

Number of use cases

Description

User

Sub-

function

Web & standalone application for managing members of organization

Web-based Customer Relationship Management (CRM) system

ExternalUK Collaboration for a Digital Repository

(UKCDR)

Polish IndustryWeb-based e-government Content Management System (CMS)

PolishWeb-based Document Management

System (DMS)

Polish IndustryWeb-based invoices repository for remote

accounting

ExternalProtein Information Management System

(PIMS)

Polish IndustryIntegration of two sub-system s in ERP

scale system

Polish Industry

Polish IndustrySingle functional module for the web-

based e-commerce solution

Polish IndustryWeb-based workflow system with Content Management System (CMS)

ExternalPolaris - Mission Data System (MDS)

process demonstration

ExternalVesmark Smartware™ - Financial

decission system

External Photo Mofo - Digital images management

External iConf - Java based conference application

ExternalOne Laptop Per Child - Web-based

Content Management

the sequences of steps performed by the same actor are frequently longer then one.

What is more interesting this tendency is more visible in case of main actor steps

(38.4% of main actor steps sequences are longer than one step, and only 25.5% in

case of secondary actor - in most cases system being developed). One of potential

reasons is that the actions performed by main actor are more important, from the

business point of view. This is contradict to the concept of transaction presented by

Ivar Jacobson [60]. Jacobson enumerated four types of actions which form together

the use-case transaction. Only one of them belongs to a main actor (user request

action). The rest of them are system actions: validation, internal state change, and

response. It might look that those actions should frequently form longer sequences.

However, 74.6% of steps sequences performed by system consisted of a single step.

The distributions and number of steps in main actor sequences are presented in


Number of steps in main scenario

Density

1 3 5 7 9 11 13 15

0.00

0.10

0.20

a)

Number of steps in extension

Density

1 3 5 7 9 11 13 15

0.0

0.1

0.2

0.3

0.4 c)

A B C D E F G H I J K L M O P

510

15

Project

Num

ber o

f ste

ps in

mai

n sc

enar

io

b)

A B C D F G H I J K N O P

24

68

Project

Num

ber o

f ste

ps in

ext

ensio

n

d)

Figure 5.2: Scenarios lengths in analysed use cases a) histogram presents the number of steps inmain scenario (data aggregated from all projects), b) box plot presents the number of steps in mainscenario (in each project), c) histogram presents the number of steps in extension (data aggregatedfrom all projects), d) box plot presents the number of steps in extension (in each project, note thatproject E was excluded because it has all alternative scenarios written as stories)

Figure 5.3.

If we look deeper into the textual representation of the use cases, some interest-

ing observation concerning their structure might be made. One of them regards the

way authors of use cases describe validations actions. Two different approaches are

observed. The most common is to use extensions to represent alternative system

behaviour in case of failure of the verification process (41.3% of extensions have this

kind of nature). Second one, and less frequent, is to incorporate validation actions

into steps (e.g. System verifies data). This kind of actions are observed only in 3.4%

of steps.

Analysed properties can be classified into two separate classes. The first one

contains properties which are observed in nearly all of the projects, with compara-

ble intensity. Those kind of properties are seem to be independent from the spec-

ification they belong to (from its author’s writing style). The second class is an op-

posite one, and includes properties which are either characteristic only for a certain

set of use cases or their occurrence in different specifications is in between of two

extremes - full presence or marginal/none. Such properties are project-dependent.

5.4. Analysis of use-cases-based specifications defects 39

2.8%3.8%10.2%

61.6%

21.6%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 2 3 4 >4

Main actor's steps sequence length

De

nsi

ty

0.8%1.6%

18.8%

4.3%

74.6%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 2 3 4 >4Secondary actor's steps

sequence length

De

nsi

ty

a) b)

Figure 5.3: Distributions of actors steps-sequences lengths in analysed use cases, a) histogrampresents the length of the steps sequences performed by main actor, b) histogram presents length ofthe steps sequences performed by secondary actor - e.g. System

More detailed description of analysed requirements specifications is presented in

Table 5.2.

5.4 Analysis of use-cases-based specifications defects

Use cases seem simple in their structure and easy to understand, however it is easy

to introduce defects, which might influance the communiation between analyst,

customer and project stakeholders. Different research [7, 93] indicated that vague,

inconsistent or incomplete requirements are the main reasons for projects failures.

Defects make specification cumbersome, tedious and hard to read, analyse and change,

hence they can lead to misunderstandings and finally to higher costs of software

projects. Based on patterns and guidelines presented by Adolph et al. and Cock-

burn [9, 31] a set of defect-types has been distinguished. Projects stored in UCDB

have been analysed in order to collect statistical data about defects found in the re-

quirements specification. Table 5.3 presents the results of this analysis.

One of the main observations is that almost 86% of the analysed use cases con-

tain at least one defect, which proves that it is not easy to provide requirements

specification of good quality. For researchers, who try to analyse use case quality

the knowledge of average defects density can be used to tune their methods and

tools in order to make them more effective and efficient.

Large number of different defects can be found in most of the projects. For ex-

ample steps without explicitly specified actor can be found in 13 projects, 11 projects


Table 5.2: Results of the structure analysis of the use cases stored in UCDB

Overall

Project

AB

CD

EF

GH

IJ

KL

MN

OP

524

17

37

39

77

41

10

90

16

21

975

16

26

18

16

16

4.82

4.76

4.92

2.95

4.34

5.78

3.90

5.10

3.38

4.33

3.67

6.36

3.44

6.58

5.22

2.88

3.63

SD

2.41

0.97

1.46

2.69

2.25

1.26

1.20

2.41

0.50

2.01

1.32

3.25

1.67

1.50

1.40

1.20

1.50

72.1%

94.1%

51.4%

41.0%

55.8%

92.7%

100%

68.9%

81.3%

90.5%

55.6%

98.7%

75.0%

100%

38.9%

50.0%

62.5%

1.57

1.29

0.57

0.95

0.92

1.63

1.00

1.28

2.94

2.33

1.00

2.69

1.75

4.46

0.39

0.63

0.88

SD

1.88

0.77

0.60

1.72

1.12

0.62

0.00

1.30

2.98

1.58

1.58

2.91

1.57

1.61

0.50

0.81

0.81

2.46

1.80

2.52

1.00

1.45

N/A

1.10

2.77

2.30

1.64

1.44

3.09

N/A

N/A

2.67

1.30

1.36

SD

1.61

0.84

0.87

0.00

0.59

N/A

0.32

1.87

0.65

0.67

0.53

1.80

N/A

N/A

2.89

0.48

0.50

3.4%

3.7%

5.5%

11.3%

1.8%

0.0%

0.0%

0.0%

27.8%

17.6%

3.0%

0.0%

0.0%

15.2%

0.0%

2.2%

3.4%

41.3%

9.1%

76.2%

64.9%

63.4%

40.3%

100%

50.4%

78.7%

89.8%

100%

15.3%

60.7%

21.6%

0.0%

33%

50.0%

161.6%

42.3%

93.1%

88.6%

37.0%

61.9%

87.5%

91.7%

47.4%

52.4%

50.0%

52.6%

45.5%

67.4%

52.8%

71.4%

45.0%

221.6%

23.1%

3.5%

6.8%

34.8%

36.1%

12.5%

0.0%

0.0%

19.1%

42.9%

16.4%

27.3%

32.7%

33.3%

14.3%

10.0%

310.2%

26.9%

3.5%

4.6%

16.3%

0.0%

0.0%

8.3%

31.6%

23.8%

7.1%

12.9%

22.7%

0.0%

11.1%

14.3%

30.0%

43.8%

7.7%

0.0%

0.0%

6.5%

0.0%

0.0%

0.0%

21.1%

0.0%

0.0%

8.6%

4.6%

0.0%

2.8%

0.0%

15.0%

>4

2.8%

0.0%

0.0%

0.0%

5.4%

2.1%

0.0%

0.0%

0.0%

4.8%

0.0%

9.5%

0.0%

0.0%

0.0%

0.0%

0.0%

174.6%

82.6%

96.3%

72.2%

67.9%

100%

100%

90.0%

50.0%

12.5%

62.5%

72.1%

100.0%

0.0%

0.0%

60.0%

10.0%

218.8%

17.4%

3.7%

22.2%

29.6%

0.0%

0.0%

8.0%

16.7%

43.8%

37.5%

14.8%

0.0%

83.3%

68.4%

40.0%

0.0%

34.3%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

2.0%

33.3%

25.0%

0.0%

7.4%

0.0%

16.7%

26.3%

0.0%

0.0%

41.6%

0.0%

0.0%

5.6%

2.5%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

4.1%

0.0%

0.0%

5.3%

0.0%

0.0%

>4

0.8%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

18.8%

0.0%

1.6%

0.0%

0.0%

0.0%

0.0%

0.0%

38.4%

0.0%

0.0%

100%

0.0%

0.0%

0.0%

100%

56.3%

100%

0.0%

6.7%

100%

96.3%

0.0%

0.0%

6.3%

12.4%

0.0%

0.0%

0.0%

32.5%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

44.2%

0.0%

0.0%

33.3%

0.0%

0.0%

1.92

0.00

0.00

0.00

3.20

0.00

0.00

0.00

0.00

0.00

0.00

1.09

0.00

0.00

1.33

0.00

0.00

SD

1.62

N/A

N/A

N/A

2.02

N/A

N/A

N/A

N/A

N/A

N/A

0.29

N/A

N/A

0.52

N/A

N/A

37.4%

82.4%

100%

0.0%

0.0%

97.6%

0.0%

0.0%

0.0%

23.8%

77.8%

0.0%

81.3%

38.5%

0.0%

0.0%

56.3%

14.3%

0.0%

0.0%

97.4%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

100%

100%

0.0%

0.0%

12.5%

33.0%

100%

100%

100%

0.0%

100%

0.0%

0.0%

68.8%

57.1%

0.0%

0.0%

0.0%

0.0%

0.0%

100%

0.0%

6.4%

0.0%

0.5%

0.0%

0.0%

0.0%

25.6%

14.2%

0.0%

0.0%

6.1%

33.3%

0.0%

0.6%

0.0%

0.0%

0.0%

66.8%

27.3%

100%

8.1%

87.3%

0.0%

100%

100%

100%

100%

100%

100%

0.0%

0.0%

42.9%100.0%

100%

33.2%

72.7%

0.0%

91.9%

12.7%

100%

0.0%

0.0%

0.0%

0.0%

0.0%

0.0%

100%

100%

57.1%

0.0%

0.0%

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

Property

Requirements specification independent

Number of use cases

Number of steps in main scenario

Mean

Use cases with extensions

Number of extensions in use case

Mean

Number of steps in extension

Mean

Steps with validation actions

Extensions which are validations

Main actor's steps sequence length

in main scenario

Secondary actor's steps sequence

length in main scenario

Requirements specification dependent

Use cases with additional description

Number of use cases with sub-scenario

Number of steps in sub-scenario

Mean

Use cases with pre-conditions

Use cases with post-conditions

Use cases with triggers

Number of steps with reference to use cases

Number of extensions with scenario

Number of extensions with stories

Explicitly defined Business Rules


Table 5.3: Results of the quality analysis of the use cases stored in UCDB

Overall

Project

AB

CD

EF

GH

IJ

KL

MN

OP

2.55%

0.00%

0.00%

0.00%

0.00%26.83%0.00%

2.22%

0.00%

0.00%

0.00%

0.00%

6.25%

0.00%

0.00%

0.00%

0.00%

56.25%

YY

YN

YN

NY

YN

NN

NY

YY

4.76%11.76%0.00%51.28%0.00%

0.00%

0.00%

0.00%

0.00%

0.00%

0.00%

0.00%

0.00%

0.00%

0.00%31.25%6.25%

4.25%

4.55%

0.00%

0.00%

0.00%

0.00%

0.00%

0.00%

8.51%

0.00%

0.00%14.85%0.00%

0.00%

0.00%

0.00%

0.00%

35.80%77.27%0.00%86.49%6.49%

100%

0.00%

5.22%14.89%8.16%

0.00%

0.00%

100%

100%

0.00%

100%

0.00%

4.13%

0.00%

0.00%

0.00%

0.00%

0.00%

0.00%

0.00%

8.51%

8.16%

0.00%12.87%0.00%

0.00%

0.00%

0.00%

0.00%

11.39%0.00%

0.00%51.28%11.69%0.00%20.00%15.56%0.00%

9.52%22.22%0.00%37.50%7.69%

0.00%37.50%25.00%

6.80%

0.00%

0.00%

5.13%

7.79%

0.00%

0.00%

5.56%

0.00%

0.00%

0.00%36.00%0.00%

0.00%

0.00%

0.00%

0.00%

22.11%11.76%10.81%56.41%48.05%9.76%

0.00%

0.00%50.00%28.57%0.00%12.00%37.50%3.85%44.44%87.50%56.25%

1.53%

0.00%

0.00%

0.00%10.39%0.00%

0.00%

0.00%

0.00%

0.00%

0.00%

1.33%

0.00%19.23%0.00%

0.00%

0.00%

0.80%

1.11%

0.85%

5.08%

0.00%

0.00%

0.00%

0.00%

0.00%

5.95%

0.00%

0.18%

0.00%

0.00%

6.86%

0.00%

5.17%

4.56%

0.00%

0.42%

0.00%

0.00%

0.00%

0.00%12.87%0.00%

0.00%

2.17%

0.36%

1.82%

0.00%47.06%16.67%1.30%

0.29%

0.00%

0.00%

0.85%

0.24%

0.00%

0.00%

0.00%

0.00%

1.19%

0.00%

0.09%

0.00%

0.00%

5.88%

0.00%

0.00%

19.37%0.00%

1.69%38.14%0.94%

0.84%

0.00%45.05%0.68%

0.60%

0.00%

0.54%

0.00%

0.58%13.73%51.67%20.78%

3.63%

1.11%

0.00%55.93%0.24%

0.84%

0.00%

2.70%

0.00%

0.00%

0.00%

0.27%

0.00%

0.00%13.73%31.67%16.88%

2.51%

0.00%

5.08%27.12%0.71%

0.00%

0.00%

1.16%

0.68%

2.38%

2.17%

1.45%

1.82%

2.92%

9.80%

1.67%

2.60%

3.79%

0.00%16.95%11.02%0.00%

0.00%

4.00%

0.77%

0.00%

0.00%

4.35%

4.09%

1.82%12.28%6.86%11.67%2.60%

3.71%

4.44%

0.85%

3.39%

3.30%

0.00%

0.00%11.45%0.00%

0.00%

8.70%

0.91%

0.00%

7.60%

0.98%

3.33%

0.00%

1.22%

0.00%

0.00%

4.24%

0.71%

0.00%

0.00%

0.00%

0.00%

0.00%

0.00%

3.00%

0.00%

0.58%

3.92%

1.67%

0.00%

3.16%

0.00%47.03%0.00%

0.00%

0.00%

0.00%

0.00%

2.05%

0.60%

0.00%

0.00%

0.00%

0.00%

2.94%

0.00%

5.19%

Property

Use-case set

Level

Use case duplicates (the same actions

operating on different business objects)

Lack of hierarchical structure (Y/N)

Use-case Level

Number of use cases, in which name

doesn't describe the goal

Number of non-detectable condition in

extensions

Number of extensions without

scenarios

Number of extensons with nested

scenarios

Scenario Level

Number of use cases with less then 3

steps in main scenario

Number of use cases with more then 9

steps in main scenario

Lack of interactions between actors

Number of use cases with at least

double nested sub-scenarios

Step level (steps from main scenario and from extensions)

Number of steps with different tenses

used

Number of steps in which user

interface terms are used

Number of steps in which technical

terms are used

Number of steps in which no actor is

specified

Number of steps in which passive

voice for the action is used

Number of steps with complex

sentence structure or with more then

one sentence

Number of steps with possibility of

different interpretations

Number of steps with details of

elements of business objects

Number of steps with conditional

clauses

Number of steps with language

mistakes

5.5. Building referential specification 42

have some extensions missing alternative scenarios, the same number of projects

have use cases with less then 3 steps. Some of the defects are again unique only to

some specifications. For instance langauge mistakes were found mainly in project

B (over 47% of steps have language mistakes), while most of the projects did not

have such problems. Scenarios longer than 9 steps is another example of a project-

specific defect (it mainly relates to Project K, which has over 30% of use cases with

this defect). The simple explenation for this is that different authors have different

writing styles and probably they are not aware of some problems and defects.

Another interesting finding is that one defect can lead to another defect. The

analysis shows that steps without actor tend to written using passive voice (projects

C, N, O, P). A similar situation can be observed in case of language mistakes and

complex-steps structure which often lead to different possible interpretations (projects

B, C, N).

5.5 Building referential specification

Combining use-cases model and average values coming from the analysis, a profile

of the typical use-cases-based requirements specification can be derived. In such

specification all properties would appear with the typical intensity. In other words,

analysis performed on such document could be perceived as an every-day task for

use-cases analysis tools.

There might be at least two approaches to acquire instance of such specification.

The first one would be to search for the industrial set of use cases, which would fulfil

given criteria. Unfortunately, most of industrial documents are confidential, there-

fore they could not be used at large scale. The second, obvious approach would

be to develop such specification from scratch. If it is built according to the ob-

tained typical-specification profile, it might be used instead of industrial specifica-

tion. Since it would describe some abstract system it can be freely distributed and

used by anyone.

Therefore, we would like to propose approach to develop an instance of the ref-

erential specification for the benchmarking purpose. The document is available at

the web site [6] and might be freely used for further research.

5.5.1 Specification domain and structure

It seems that large number of tools for the use-cases analysis have their roots in the

research community. Therefore the domain of the developed specification should

be easy to understand especially for the academia staff. One of the well-recognised


processes at university is a students admission process. Although developed speci-

fication will describe hypothetical process and tools, it should be easy to understand

for anyone who have ever participated in a similar process.

Another important decision concerns the structure of the specification (number

of actors, use cases and business objects). Unfortunately the decision is rather ar-

bitral, because number of use cases, actors and business objects may depend on

the size of the system and level of details incorporated into its description. In this

case a median number of use cases and actors from the UCDB set has been used

as a number of use cases and actors in the constructed specification. The first ver-

sion of specification consisted of 7 actors (Administrator, Candidate, Bank, Selection

committee, Students Management System, System, User), 37 use cases (3 business,

33 user, and 1 sub-function level), and 10 business objects. After extending UCDB

with data from additional 5 projects the median number of use cases decreased to

20. Therefore we decided to prepare two versions of the specification. The larger

one has 37 use cases and is labeled as FULL, while in case of smaller specification

(SMALL) number of use cases was reduced to 20.

5.5.2 Use cases structure

A breadth-first rule has been followed in order to construct referential specification.

Firstly, all use cases were titled and augmented with descriptions. Secondly, all of

them were filled with main scenarios and corresponding extensions. During that

process all changes made in specification were recorded in order to check confor-

mance with the typical profile. This iterative approach was used until the final ver-

sions of both SMALL and FULL quantitative specifications were constructed. Their

profiles are presented in Table 5.4, in comparison to the corresponding average val-

ues from the UCDB projects analysis The most important properties are those which

are observed in all of the specifications (specification independent). In case of those

metrics most values in the referential document are very close to those derived from

the analysis. This situation differs in case of properties which were characteristic

only for certain projects (or variability between projects were very high). If the con-

structed specification is suppose to be a typical one, it should not be biased by fea-

tures observed only in some of the industrial specifications. Therefore dependent

properties were also incorporated into the referential use cases, however they were

not a subject of the tunning process.

In addition based on each of the quantitative specifications a corresponding

qualitative specification was proposed. In this case we focused on introducing changes

which led to obtaining a qualitative profile close to the typical profile.


Table 5.4: Referential specifications profiles in comparison to the average profile derived fromthe analysis of the use cases stored in UCDB

4.80 4.81 4.82 - -

SD 1.21 1.50 2.41 - -

75.0% 70.3% 72.1% - -

1.60 1.50 1.57 - -

SD 0.71 0.69 1.88 - -

2.29 2.49 2.46 - -

SD 1.81 2.12 1.61 - -

4.2% 2.3% 3.4% - -

62.5% 46.2% 41.3% - -

1 64.5% 61.7% 61.6% - -

2 19.4% 21.7% 21.6% - -

3 12.9% 10.0% 10.2% - -

4 3.2% 3.3% 3.8% - -

>4 0.0% 3.3% 2.8% - -

1 73.5% 79.7% 74.6% - -

2 23.5% 15.3% 18.8% - -

3 2.9% 3.4% 4.3% - -

4 0.0% 1.7% 1.6% - -

>4 0.0% 0.0% 0.8% - -

- -2.55% 5.00% 2.70%

- -4.76% 5.00% 5.41%

- - 4.25% 4.17% 5.13%

- - 35.80% 33.33% 33.33%

- - 4.13% 4,17%% 5.13%

- -11.39% 10.00% 10.81%

- -6.80% 5.00% 5.41%

- - 22.11% 25.00% 24.32%

- -1.53% 0.00% 2.70%

- - 0.80% 0.78% 0.80%

- -4.56% 4.59% 4.78%

- - 0.29% 0.00% 0.40%

- - 19.37% 19.53% 19.12%

- -3.63% 3.91% 3.59%

- -2.51% 2.34% 2.39%

- -3.79% 3.91% 3.98%

- -3.71% 3.91% 3.59%

- - 1.22% 1.56% 1.59%

3.16% 3.13% 3.19%

Property

Quantitative Referential Specification Admission System 2.0 SMALL

Quantitative Referential Specification Admission System 2.0 FULL

Observed in Use-Cases Database

Qualitative Referential Specification Admission System 2.0 SMALL

Qualitative Referential Specification Admission System 2.0 FULL

Number of steps in main scenarioMean

Use Cases with extensions

Number of extensions in use caseMean

Number of steps in extensionMean

Steps of validation nature

Extensions of validation nature

Main actor's steps sequence length in main scenario

Secondary actor's steps sequence length in main scenario

Use case duplicates (the same actions operating on different business objects)

Number of use cases, in which name does not describe the goal

Number of non-detectable condition in extensions

Number of extensions without scenarios

Number of extensons with nested scenarios

Number of use cases with less then 3 steps in main scenario

Number of use cases with more then 9 steps in main scenario

Lack of interactions between actors

Number of use cases with at least double nested sub-scenarios

Number of steps with different tenses used

Number of steps in which user interface terms are used

Number of steps in which technical terms are used

Number of steps in which no actor is specified

Number of steps in which passive voice for the action is used

Number of steps with complex sentence structure or with more then one sentence

Number of steps with possibility of different interpretations

Number of steps with details of elements of business objects

Number of steps with conditional clauses

Number of steps with language mistakes

5.6. Benchmarking use cases - case study 45

5.6 Benchmarking use cases - case study

Having the example of the typical requirements specification, one can wonder how

it could be used. Firstly, researchers who construct methods and tools in order to

analyse use cases often face the problem of evaluating their ideas. Using some spec-

ification coming from the industrial project is a typical approach, however, this can-

not lead to the conclusion that the given solution would give the same results for

other specifications. The situation looks different when the typical specification is

considered, then researchers can assume that their tool would work for most of the

industrial specifications. Secondly, analysts who want to use some tools to analyse

their requirements can have a problem to choose the best tool that would meet their

needs. With the typical specification in mind it can be easier to evaluate the avail-

able solutions and choose the most suitable one. To conclude the example of the

typical requirements specification can be used:

• to compare two specifications and asses how the given specification is similar

to the typical one

• to asses the time required for a tool to analyse requirement specification

• to asses the quality of a tool/method

• to compare tools and choose the best one

In order to demonstrate the usage of the constructed specification we have con-

ducted a case study using the Qualitative Reference Specification Admission System

2.0 FULL. As a tool for managing requirements in a form of use cases we chose the

UC Workbench tool [87], which has been developed at Poznan University of Tech-

nology since the year 2005. Additionally we used the set of tools for defects detection

[30] to analyse the requirements. The aim of the mentioned tools is to find potential

defects and present them to analyst during the development of the requirements

specification. The tools are based on the Natural Language Processing (NLP) meth-

ods and use Standford parser [37] and/or OpenNLP [1] tools to perform the NLP

analysis of the requirements.

5.6.1 Quality analysis

In order to evaluate the quality of the tools, a confusion matrix will be used [43] (see

Table 5.5).

The symbols used in Table 5.5 are described below:

• T (true) - tool answer that is consistent with the expert decision

• F (false) - tool answer that is inconsistent with the expert decision


Table 5.5: Confusion matrix

No

TP FP

No FN TN

Expert answer

Syste

m an

swer Yes

Yes

• P (positive) - system positive answer (defect occurs)

• N (negative) - system negative answer (defect does not occur)

On the basis of the confusion matrix, following metrics [?] can be calculated:

• Accuracy (AC) - proportion of the total number of predictions that were cor-

rect. It is determined using Equation 5.1.

AC = T P +T N

T P +F N +F P +T N(5.1)

• True positive rate (TP rate) - proportion of positive cases that were correctly

identified, as calculated using Equation 5.2.

T P r ate = T P

T P +F N(5.2)

• True negative rate (TN rate) is defined as the proportion of negatives cases that

were classified correctly, as calculated using Equation: 5.3.

T N r ate = T N

T N +F P(5.3)

• Precision (PR) - proportion of the predicted positive cases that were correct, as

calculated using Equation 5.4.

PR = T P

T P +F P(5.4)

The referential use-case specification was analysed by the mentioned tools (to

perform the NLP processing Stanford library was used) in order to find 10 types of

defects described in [30]. Aggregated values of the above accuracy-metrics are as

follows:

• AC = 0.99

• TP rate = 0.97

• TN rate = 0.99

• PR = 0.80


This shows that the developed tools for defects detection are rather good, how-

ever the precision should be better, so the researchers could conclude that more

investigation is needed in this area.

If the researchers worked on their tool using only defect-prone or defect-free

specifications the results could be distorted and could lead to some misleading con-

clusions (e.g. that the tool is significantly better or that it gives very poor outcome).

Having access to the typical specification allows the researchers to explore main dif-

ficulties the tool can encounter during working with the industrial specifications.

5.6.2 Time analysis

One of the main characteristics of a tool being developed is its efficiency. Not only

developers are interested in this metric, but also for the users - e.g. when analysts

have to choose between two tools which give the same results, they would choose

the one which is more efficient. However, it can be hard to asses the efficiency

just by running the tool with any data, as this may result with distorted outcome.

Use-cases benchmark gives the opportunity to measure the time, required by a tool

to complete its computations, in an objective way. It can be also valuable for re-

searchers constructing tools to consider using different third-party components in

their applications.

For instance, in case of the mentioned defect-detection tool there is a possibil-

ity of using one of the available English grammar parsers. In this case study two of

them will be considered - Standford parser and OpenNLP. If one exchanges those

components the tool will still be able to detect defects, but its efficiency and accu-

racy may change. Therefore, the time analysis was performed to asses how using

those libraries influence the efficiency of the tool. The overall time required to anal-

yse the typical specification2 for the tool using Stanford parser was 53.11 seconds

and for the one with OpenNLP it was 29.46 seconds. Although the first time value

seems to be large, the examined tool needed on average 280 ms to analyse a single

use-case step, so it should not be a problem to use it in the industrial environment.

Of course, when comparing efficiency, one have to remember that other factors (like

the quality of the results, memory requirements or the time needed for the initial-

ization of the tools) should be taken into account, as they can also have significant

impact on the tool itself. Therefore, time, memory usage and quality analysis is pre-

sented in Table 5.6. This summarised statistics allow researchers to choose the best

approach for their work.

2Tests were performed for the tools in two versions: with Stanford and OpenNLP parsers. Eachversion of the tools analysed the referential specification five times, on the computer with Pentium

5.7. Conclusions 48

Table 5.6: Case-study results (tools are written in Java, so memory was automatically managed –garbage collection)

Stanford 53.11 0.28 212 37.01 0.8 27 539

29.46 0.16 338 13.58 15.4 225 3977

Quality

AC PR

Stanford 0.99 0.97 0.99 0.80

0.99 0.83 0.99 0.79

Summarised for all components English grammar parser only

English grammar parser used

Overall processing time [s]

Mean time needed to analyse one step [s]

Maximal memory utilization [MB]

Overall processing time [s]

Startup time [s]

Initial memory usage [MB]

Memory usage while processing single element

[KB]

OpenNLP

TP rate TN rate

OpenNLP

Although this specification describes some abstract system, it shows the typi-

cal phenomenon of the industrial requirements specifications. The conducted case

study shows that the developed referential use-case specification can be used in dif-

ferent ways by both researchers and analysts.

5.7 Conclusions

In the paper an approach to create a set of referential use-cases-based requirements

specifications was presented.

In order to derive a profile of a typical use-cases-based requirements specifica-

tion 524 use cases were analysed. It has proved that some of the use-case properties

are project-independent (are observed in most of the projects). They have been used

for creating the referential specification. On the other hand, there is also a number

of properties which depend very much on the author.

In order to present potential usage of the benchmark specification, a case study

was conducted. Two sets of tools for defect detection in requirements were com-

pared from the point of view of efficiency and accuracy.

In this paper a second release of UCDB is analysed, which was recently extended

from 432 to 524 use cases. A positive observation is that despite the number of use

cases increased by 21% the typical profile did not change a lot. Therefore it seems

that the typical profile derived based on the use cases stored in the database is stable

and should not differ much in the future releases.

Core 2 Duo 2.16 GHz processor and 2GB RAM.

5.7. Conclusions 49

Acknowledgments.

We would like to thank Piotr Godek, Kamil Kwarciak and Maciej Mazur for support-

ing us with industrial use cases.

The research presented at the CEE-SET’08 and being part of this paper has been

financially supported by the Polish Ministry of Science and Higher Education under

grant N516 001 31/0269.

Additional case studies and analysis have been financially supported by Foun-

dation for Polish Science Ventures Programme co-financed by the EU European Re-

gional Development Fund.

Chapter 6

Identification of events in use cases

performed by humans

Preface

This chapter contains the paper: Jakub Jurkiewicz, Jerzy Nawrocki, Mirosław

Ochodek, Tomasz Głowacki, HAZOP-based identification of events in use cases. An

empirical study, accepted to the Empirical Software Engineering Journal, DOI: 10.1007/s10664-

013-9277-5.

My contribution to this chapter included the following tasks: (1) elaboration of

the H4U method; (2) conducting the experiments with students and with IT profes-

sionals; (3) analysis of the results of the experiments; (4) interpreting the results of

the experiments;

Context:

In order to be able to fully evaluate the automatic method of identification of

events in use cases presented in Chapter 7, the effectiveness and efficiency of man-

ual approaches had to be evaluated in the first place. No method aimed at identifi-

cation of events has been found, therefore, a method based on the HAZOP approach

is proposed. The method is compared to an ad hoc approach. Accuracy and review

speed of both methods are evaluated in experiments with students and IT profes-

sionals.

Objective: The goal of this study was to check if (1) HAZOP-based event identifica-

tion is more effective than ad hoc review and (2) what the review speed of these two

approaches is.

Method: Two controlled experiments were conducted in order to evaluate the ad hoc

approach and H4U method of event identification. The first experiment included 18

students, while the second experiment was conducted with the help of 64 profes-

50


sionals. In both cases, the accuracy and review speed of the investigated methods

were measured and analysed. Moreover, the usage of HAZOP keywords was anal-

ysed. In both experiments, a benchmark specification based on use cases was used.

Results: The first experiment with students showed that a HAZOP-based review is

more effective in event identification than an ad hoc review and this result is statis-

tically significant. However, the reviewing speed of HAZOP-based reviews is lower.

The second experiment with IT professionals confirmed these results. These experi-

ments also showed that event completeness is hard to achieve. On average it ranged

from 0.15 to 0.26.

Conclusions: HAZOP-based identification of events in use cases is an useful alter-

native to ad hoc reviews. It can achieve higher event completeness at the cost of an

increase in effort.

6.1 Introduction

Use cases introduced by [59] have gained a lot of attention in the software engineer-

ing community since their introduction in the 1980’s ([92]). Roughly speaking, a use

case is a sequence of steps augmented by a list of events and the alternative steps

associated with them that respond to those events. Both the steps and the events

are written in a natural language. This makes reading and understanding software

requirements specification easier, especially if the readers are IT laymen.

One of the main attributes that can be assigned to use cases (and any other form

of requirements) is completeness ([55]). In the context of use cases, completeness

can have several meanings: e.g. functional completeness (the set of use cases covers

all the functional features of the system) or formal completeness (there are no TBDs

— [55]). In this paper the main concern is event-completeness: requirements spec-

ification presented in the form of use cases is event-complete if all the events that

can occur during the execution of the use cases are named in this specification.

[114] indicate that missing error handling conditions account for most software

project rework. This increases project costs and also leads to delayed delivery. In

the context of use cases, these error handling conditions relate directly to use-case

events. Moreover, many authors ([83, 120, 28, 35, 9]) indicate that the list of events

should be complete, e.g. [9] propose a pattern called Exhaustive Alternatives which

states Capture all failure and alternatives that must be handled in the use case. Thus,

one needs an effective method of checking the event-completeness of a given set

of use cases. Such a method would also be useful in the context of the spiral re-

quirements elicitation recommended by [9]. Their approach is based on the breadth

6.2. An evolutionary approach to writing use cases 52

before depth pattern. Firstly, they suggest listing all the actors and their goals, then

adding a main success scenario to each goal. After a pause, used to reflect on the

requirements described thus far, one should continue identifying the events and

adding alternative steps for each event. In this paper, two methods of events iden-

tification are studied experimentally: ad hoc and H4U (HAZOP for Use Cases). The

former does not assume any particular process or tools: The Analyst is given a set of

use case scenarios and he/she has to identify the possible events in any way he/she

preferes. The second method is based on the HAZOP review process ([102]).

The paper aims at answering the following questions: 1) What level of complete-

ness can one expect when performing events identification (with H4U and ad hoc

approaches)? 2) How should we perform events identification in order to obtain the

maximum completeness? 3) How can we minimize the effort necessary to obtain

the maximum level of completeness?

The paper starts with a short presentation of the 9-step procedure to elicit re-

quirements specification in Section 6.2. That procedure is an implementation of

the breadth before depth strategy recommended by Adolph et al., and it shows a

context for events identification (however, other methods of requirements specifi-

cation based on use cases can also benefit from H4U). Details of the H4U method

along with the example are presented in Section 6.3. To evaluate H4U against ad

hoc review, two controlled experiments have been conducted: with students and

with practitioners. The design of the experiments and their results are presented in

Section 6.4. Both experiments focus on individual reviews. An interesting aspect of

the conducted experiments is usage of HAZOP keywords. Those data are presented

in Section 6.5. Related work is discussed in Section 6.6.

6.2 An evolutionary approach to writing use cases

6.2.1 Overview of use cases

A use case consists of five main elements (see Figure 6.1): (1) a header including

the name of a use case; the name indicates the goal of the interaction; (2) a main

scenario describing the interaction between the actor and the described system;

the main scenario should present the most typical sequence of activities leading to

achievement of the goal of the interaction; (3) extensions consisting of (4) events

that can occur during the execution of the main scenario and (5) alternative steps

taken in response to the events. The focus of this paper is on the fourth element of

use cases, i.e., the events.

Moreover, use cases can specify requirements on different levels: summary goals,


user goals, subfunctions ([31]). The user goals level is the most important one as it

shows the main interactions between the user and the system. Events in use cases at

this level should relate either to decisions made by the user regarding the path taken

to obtain his/her goal or obstacles which the user can encounter when performing

actions leading him/her to achieving his/her goal. Use cases present functional be-

haviour requirements and should not include information related to architectural

decisions or non-functional requirements (NFRs), such as events related to the per-

formance or to internal architectural components. Non-functional requirements are

specified separately using, for instance, ISO 25010 characteristics ([56]).

As with every other element, the form and the quality of events depend heavily

on the skills of the author of a use case. However, some experts ([9]) have proposed

best practices concerning writing high quality use-case events. The most impor-

tant practice is called Detectable Conditions. It suggests describing only those events

which can be detected by the system. If the system cannot detect an event, then it

cannot react to it; hence, specifying those events is useless as they can blur the use

case.

ID: UC1 Name: Edit a blog post Actor: Author Main scenario: 1. Author opens a blog post. 2. Author chooses the edit option. 3. System prompts for entering the changes. 4. Author enters the changes to the post. 5. Author confirms the changes. 6. System stores the changes.

Extensions:

2.A. Author does not have permissions to edit the given post.

2.A.1. System displays error message

2

3

4

5

1

Figure 6.1: An exemplary use case with a header (1), main scenario (2), extensions (3) consistingof an event (4) and an alternative sequence of steps (5).

6.2.2 Process of use-case elicitation

There are two contexts in which event identification is useful: review of use cases

and elicitation of use cases.


In the context of use-case review, a reviewer (or an inspection team) is given a

set of use cases and he/she tries to determine if those use cases are event-complete.

In order to do this, one could identify a set of events associated with each step and

check if every event from this set has its counterpart in the specification under re-

view.

In the context of use case elicitation, the practice of spiral development sug-

gested by [9] can be very useful. It can be supported by another practice called

breadth before depth: first an overview of use cases should be developed (almost

all the use cases should be named), and detailed description should be added at the

next stage. These two practices go well with the approach proposed by [31]: “Expand

your use cases into Scenario Plus Fragments by describing the main success scenario

of each use case, naming the various conditions that can occur, and then fleshing out

some of the more important ones”.

In our work the patterns of spiral development and breadth before depth have

been elaborated into a 9-step process for requirements specification elicitation. That

process supports the XPrince methodology ([90]) and is presented in Figure 6.2 as a

use case. Description of the problem, which is the main outcome of Step 1, is based

on RUP’s project statement ([75]). The business process described in Step 2 can be

extended if necessary with other forms of knowledge description, such as examples

of information objects (e.g. Invoice) if the domain requires this level of detail. A

context diagram created in Step 3 allows better understanding of the overview of the

system, the actors interacting with the system, and the interfaces which the system

will need to use. The use-case diagrams developed in Step 4 present the goals of the

actors. The domain objects, identified in Step 5, are the objects the system will ma-

nipulate with or present to the user. These objects represent the physical objects of

the real world. Having those objects the checking of completeness of the set of use

cases from the point of view of CRUD (Create, Retrieve, Update, Delete) ([84]). In

Step 6 each use case manipulating a domain object is classified as C, R, U or D, and

if any of those letters is missing for a given domain object, then the analyst should

provide an explanation or extend the set of use cases to include the missing opera-

tion (this kind of analysis was proposed by [121]). In Step 7 a main scenario for each

use case is provided. Events which may occur when main scenarios are executed

are identified in Step 8; here the proposed method (H4U) is used. Step 9 completes

each use case with alternative steps assigned to each event.

From the above it follows that whenever the analyst wants to review use cases

or elicit them in a systematic way he or she will be confronted with the problem of

events identification.


Name: Elicitate functional use cases Actor: Analyst Main scenario: 1. Analyst describes the problem, its implications and the proposed solution. 2. Analyst acquires domain knowledge and presents it as a set of business processes. 3. Analyst outlines the solution with a context diagram. 4. Analyst extends the context diagram to a use-case diagram. 5. Analyst identifies domain objects through analysis of domain knowledge described in step 2. 6. Analyst checks the completeness of the use cases operating the domain objects. 7. Analyst provides a main scenario for each use case. 8. Analyst annotates each use-case step with events that may possibly occur when the step is executed. 9. Analyst provides for each event an alternative sequence of steps.

Figure 6.2: The 9-step process of use-case elicitation presented as a use case.

6.2.3 The problem

In this paper the focus is on step #8, identification of events which may occur when

the main scenario is executed. To be more precise, one can formulate the problem

in the following way:

Problem of events identification: Given a step of a use case, identify a set of events

that may occur when the step is executed. Identify all possible events assuming that

the system is formally correct and exclude the events related to non-functional aspects

of the system.1

Obviously, the events unrelated to the step, but occurring in parallel to it just by

chance, should be excluded — we are only interested in the cause-effect relationship

between an event and a step in the main scenario that causes the change in the

executed scenario.

There can be several approaches to the problem of events identification. Typi-

cally events in use cases are identified as a result of an ad hoc process by an analyst

or customer who try to guess what kind of events or exceptions can occur while use-

case scenarios are being performed. However, a special approach, oriented strictly

towards the problem of events identification could also be used. An example of such

a method is H4U (discussed in details in Section 6.3). Based on this method the

following goal arises: Analyze the methods of event identification for the purpose of

evaluation with respect to accuracy and speed from the point of view of an analyst.

1In order to facilitate the identification of events in the experiments presented in Section 6.4, wegave the participants the freedom to identify any kinds of events they wanted to. However, it resultedin a visible number of events that had to be rejected from the analysis.

6.3. HAZOP based use-case review 56

6.3 HAZOP based use-case review

The proposed H4U method for events identification is based on a well-known review

approach for mission critical systems called HAZOP ([102]) (short for H4U which

stands for HAZOP for Use Cases). The sections below describe the details of the H4U

method and discuss how it differs from the original HAZOP approach.

6.3.1 HAZOP in a nutshell

HAZOP (Hazard and Operability Study) is a structured technique for the identifi- HAZOP

cation and analysis of hazards in existing or planned processes and designs ([29,

76, 122, 102]). Moreover, HAZOP allows for the distinguishing of possible causes of

hazards and their implications, together with available or future protection steps.

The method was initially intended for the analysis of chemical plants. Later it was

adapted for analysis of computer-based systems.

The method involves a team of experts who use special keywords that guide

them through identification of so called deviations (deviations are counterparts of

events in use-case steps). There are two kinds of keywords: primary and secondary. KEYWORDS

Primary keywords refer to attributes (Temperature, Pressure, etc.). Secondary key-

words indicate deviations (NO, MORE, etc.). Considering a fragment of a chemical

or IT system, the inspection team of experts tries to interpret a pair of keywords

(e.g., Pressure - MORE could be interpreted as too much pressure in a pipe).

HAZOP uses the following secondary keywords ([102]):

• NO: the design intend is not achieved at all,

• MORE: an attribute or characteristic exceeds the design intend,

• LESS: an attribute or characteristic is below the design intend,

• AS WELL AS: the correct design intent occurs, but there are additional ele-

ments present,

• PART OF: part of the design intend is obtained,

• REVERSE: the opposite of the design intent occurs,

• OTHER THAN: there are some elements present, but they do not fulfill the

design intend,

• EARLY: the design intent happens earlier than expected,

• LATE: the design intent happens later than expected,

• BEFORE: the design intent happens earlier in a sequence than expected,

• AFTER: the design intent happens later in a sequence than expected.

At the end of a HAZOP session, a report is created (see Figure 6.3). The report

includes, among other things, two columns: Attribute (primary keyword) and Key-

word (secondary keyword), which together describe an event that can occur when

the system is running.


QuestionsProtectionConsequencesCauseKeywordAttributeItem

Figure 6.3: HAZOP report sheet

6.3.2 H4U: HAZOP for Use Cases

To solve the problem of events identification, adapting the HAZOP method is pro- H4U

posed. The first question is how to interpret the terms “design intent” and “devia-

tion”. The H4U method is targeted at analysis of use cases, therefore:

• design intent is the main scenario of a use case;

• deviation is an event that can appear when the scenario is executed and re-

sults in an alternative sequence of steps.

There two kinds of deviations: functional and non-functional. Functional devi-

ations from the main scenario are all those events that can appear even when the

system works correctly. An example could be a trial to buy a book which is not in

the store or paying with an expired credit card. Non-fuctional deviations follow from

a violation of nonfunctional requirements such as correctness, availability, security,

etc. (see e.g. ISO 25010 characteristics [56]). Examples include user session time-

out, unacceptable slowdown, etc. Those events shouldn’t be specified within use

cases because they are cross-cutting and could appear in almost all the steps, which

would make a use case unreadable. Therefore, the deviations as well as the design

intent should be kept separate as they belong to two different dimensions. Non-

functional deviations could also be identified using HAZOP but, according to the

separation of concerns principle, they should be specified elsewhere and therefore

they are out of the scope of the H4U method.

The previously presented HAZOP secondary keywords are generic keywords used

in process industries with the addition of the last four keywords introduced in [102]

where HAZOP for software is presented. These four keywords were introduced in

order to analyze the timing aspect of software. It was decided not to alter the pre-

sented set of secondary keywords. There are other sets of keywords that could be

used, e.g. the guide words from the SHARD method proposed by [100], however, the

aim of this paper is to investigate how the well known HAZOP method can influence

events identification.

The original HAZOP primary keywords are specific to chemical installations and

they should be replaced with a set which would fit computer-based systems. To

achieve this, a set of 50 industrial use-case events were analyzed by two experts. As


a result the following primary keywords have been identified:

• Input & output data — relates to objects that are exchanged between external

actors and the system;

• System data — relates to objects already stored in the system;

• Time — relates to the “moment of time” when a step is being executed (e.g.,

dependencies between actions, pre- and post- conditions of actions); it does

not relate to non-functional aspects such as performance (e.g. response time)

or security (user session timeout).

To check the completeness of the above set of primary keywords, a small em-

pirical study has been performed. 300 use-case events have been investigated by

experts in the field, and it was possible to present each of them as a pair of one of

the proposed primary keywords and standard secondary keywords; thus, one can

assume that the proposed set of primary keywords is complete. Since the set of pri-

mary keywords consists of only 3 elements, it is easy to memorize them; therefore

there is no need to include them as a checklist during the review process.

The H4U method can be used in two ways: as an individual review method and

as a team review method. The overall process of the H4U method is presented in H4U PROCESS

the form of a use case in Figure 6.4. As an output, a report is produced according

to the H4U report table presented in Figure 6.5. The report should consist of the

following elements: Use case ID — ID of a use case, Step number — the number

of the step in the main scenario, Primary keyword — the primary keyword used to

identify an event, Secondary keyword — the secondary keyword used to identify an

event, Event — the identified event.

ID: UC1 Name: Identification of an event Actors: Reviewer (can also be a Review Team) Main flow: 1. Facilitator presents the H4U method to the Reviewer. 2. Facilitator provides the Reviewer with a set of main scenarios of use cases. 3. Reviewer reads the main scenarios step by step. 4. Reviewer uses primary and secondary to identify the events and annotates the steps with the events. 5. Reviewer presents the results the Facilitator.

Figure 6.4: Overall process of the H4U method.

6.3.3 Example

This section presents an example of how the H4U method could be used to identify

events in an exemplary use case from Figure 6.1. An analyst goes through every step

6.4. Experimental evaluation of individual reviews 59

EventSecondary keywordPrimary keywordStep numberUse case ID

Figure 6.5: H4U report table.

of the use case and uses the proposed primary and secondary keywords in order to

identify as many events as possible. Let us assume that the analyst is currently ana-

lyzing Step 4: Author enters the changes to the post. Events which might be identified

using the primary and secondary keywords analysis are presented in Figure 6.6.

Author exceeds the limit of characters per single post.Author changes the post after the allowed time passed.

Author did not provide data.

Event

LATE

NO

MORE

Secondary Keyword

Time

Primary Keyword

Input & output data

Input & output data

4UC1

4UC1

4

Step number

UC1

Use Case ID

Figure 6.6: Exemplary results of applying the H4U method to one use-case step.

6.4 Experimental evaluation of individual reviews

In order to evaluate the process of events identification in use cases, we decided to

conduct a series of controlled experiments.

The goal of the experiments was to analyze H4U and ad hoc use-case events

identification; for the purpose of evaluation; with respect to their review speed and

accuracy; from the point of view of both students and experienced IT professionals;

in the context of a controlled experiment performed at university and in industry.

The first experiment (Experiment 1) was conducted in academic settings to in-

vestigate how the structured and ad hoc approaches to events identification support

the work of students of software engineering.

The second experiment (Experiment 2) was conducted with the participation of

professionals, in order to observe how the methods affect the work of experienced

reviewers who work professionally on software development.

Both experiments were conducted according to the same design; therefore, we

will present the details only in the description of Experiment 1, and discuss the dif-


ferences in the description of Experiment 2. The results of both experiments were

independently analyzed by two researchers. Every disagreement was discussed until

a decision about the problematic element was made.

6.4.1 Empirical approximation of maximal set of events

In order to be able to analyze and interpret the results of the experiments a maximal

set of all possible events in the software requirements specification being an object

of the experiments has to be agreed. After removing irrelevant events, such a set is

to be used to assess the accuracy of events identification process. The procedure of

removing irrelevant events will be discussed in the following sections.

Three approaches to defining the reference set of all possible (acceptable and

irrelevant) events were considered:

• A1: include events that were defined by the authors of benchmark specifica-

tion;

• A2: the authors of the paper could brainstorm in order to identify their own

reference set of possible events;

• A3: construct the approximate maximal set in an empirical way, based on all

distinct events identified by the participants in the experiments (after review-

ing them).

We decided to choose approach A3, because we assumed that a group contain-

ing around 100 people would be able to generate the most exhaustive set of events.

If we had decided to accept the approaches A1 or A2, we would had the potential

problem of interpreting the proposals of events provided by the participants that

were reasonable (events that could possibly appear in this kind of system) but nei-

ther the authors of the paper nor the authors of the use-case benchmark could iden-

tify them. After the experiment, it turned out that it was a good choice, because the

set of events defined by the participants of the experiments contained all the events

that were originally present in the benchmark use cases.

Rejecting undetectable events

As it was mentioned in Section 6.2, only the events which may be detected by a sys-

tem are considered valid. For example the event “system displayed a wrong form” is

apparent to the user, but the system cannot detect it and react to it. These kinds

of events were very often related to possible defects (bugs) in the system. All the

participants (in both experiments) proposed in total 4024 events during the experi-

ments. Based on the initial analysis, 1593 events were rejected as being undetectable

or invalid, therefore 2431 events were taken into consideration in further analysis.


Handling duplicated events

After the experiments, we analyzed every event in the context of the use case it be-

longs to and the use-case step by which it is triggered. As a result an abstract class

was assigned to such an event; the class represents its semantic meaning (the same

abstract event can be expressed in different ways using natural language). It was

noticed that some participants had included a number of abstract classes of events

in one event description. Moreover, during the analysis it appeared that the partici-

pants had employed different approaches to assign events to use-case steps. Let us

consider an exemplary part of the use-case scenario presented in Figure 6.7.

4. … 5. Candidate fills and submits the form. 6. System verifies if data is correct. 7. System informs Candidate that account has been created. 8. …

Figure 6.7: Example of groups of steps in a use case

An exemplary list of events that could be identified by participants, together with

the identifiers of steps the events were assigned to, are presented in Figure 6.8. It

can be noticed that all the proposed events describe the problem of missing data,

and all of them could be assigned to each of the three steps. Therefore, it does not

really matter to which of the three steps the event is assigned to, however, all of

them should be treated as the same event. We have to mention that some of the

participants in the similar cases assigned the same type of event to all of the steps,

i.e. they wrote down three separate events. Unfortunately, this could visibly influ-

ence the results of the experiments, as participants who had written down the same

events many times for closely related steps would have a higher number of identi-

fied events, even though the events carried the same meaning.

To solve this problem we decided to identify groups of consecutive steps that are

indistinguishable in terms of events that can be assigned to them. It was decided to

treat this kind of group of steps as a whole: if participant assigned an event to any

of the steps from the group it was treated as if the event was assigned to the whole

group; if the participant assigned an event to more than one step from the group it

was considered as a single assignment to the whole group. The requirements specifi-

cation, being an object of the experiments, consisted of 33 use cases and 113 groups

of steps (and 156 individual steps).


After this phase of analysis, 2510 abstract classes of events assigned to events

descriptions were identified.

7Candidate provided no data

6User didn't provide all required data

5

Step number

Candidate doesn't provides all necessary data

Event

Figure 6.8: Example of events that could be identified for a group of use-case steps presented inFigure 6.7.

Excluding non-use-case events

As a result of the analysis of the abstract classes of events assigned to groups of

steps, we observed that some classes of valid events (which could be detected by

the system) should not be included in descriptions of use cases on user goals level

or should be excluded from the analysis.

Most of such events related to non-functional requirements, which should be

specified separately from use cases (see the explanation in Section 6.3.2).

The second group of events that we decided to exclude from the analysis ex-

tended main scenarios with some auxiliary functions. The participants in the ex-

periments could use their imagination to propose any number of such extensions,

which would make it extremely difficult to decide whether or not we should include

them in the analysis.

The rejected classes of events are presented in Table 6.1, together with examples

and the justification for excluding them.

Events included into the maximal set of events

In the end 1550 abstract classes of events assigned to events descriptions were left

for construction of the approximation of the maximal set of events. The list of ac-

cepted events classes is presented in Table 6.2.


Table 6.1: Abstract classes of events that should not be included into use-case descriptions (theywere excluded from the analysis).

Abstract classof event

Explanation Example Number

AuthorizationProblem

Events concerning authentication and au-thorization problems. They should not beincluded in alternative flows of a user-leveluse case, but handled separately as NFRs.

User does not have per-mission to add a newthread.

42

Concurrencyof Operations

Events related to concurrency problemsat the level of internal system operations(e.g., concurrent modification of the sameobject). This kind of requirement relates toNFRs (transactional operations). There arealso concurrent operations that relate tobusiness rules (e.g., while one is applyingto a major the admission is closed). Thiskind of events belong to another abstractclass of events and was accepted.

Candidate data waschanged by the ad-ministrator before thecandidate submittedmodifications.

41

ConnectionProblem

Events describing problems with the con-nection between a user and the system.Use-cases should not include technical de-tails. It should be described as an NFR.The system is not able to respond to suchan event.

The data could not besent to the server.

57

Internal Sys-tem Problem

Events describing failures inside the sys-tem. These events should not be includedas user-level use cases.

System was not able tostore the data in thedatabase.

339

Linked Data Problems related to the consistency ofdata stored in the system. It should ratherbe described in the data model, businessrules, or NFRs.

Administrator selectsitem to be removed thatis connected to otheritems in the system.

7

Other Action Events expressing actor’s will to perform adifferent action than the action describedin a use-case step. Any participant couldcome up with an enormous number ofsuch events. We would not be able to as-sess if they are reasonable and required bythe customer.

User wants to viewthe thread instead ofadding a new one.

226

Timeout Events referring to exceeding the timegiven to complete a request (e.g., user ses-sion timeout). Should be describe as anNFR.

It takes candidate toolong to fill the form.

68

Withdraw Events describing cases when a user wantsto resign from completing a running op-eration. This events could be assigned toalmost every step of a use case, so thiswould make the use case cluttered.

User does not want toprovide the data, sohe/she quits.

180

6.4.2 Experiment 1: Student reviewers

Experiment design

The independent variable considered in the experiment was a method used to iden-

tify events in use cases (two values were considered: ad hoc and H4U). Two depen-

dent variables were defined: the speed of the review process and its accuracy. The


Table 6.2: Abstract classes of events that should be included in use-case descriptions (they wereincluded in the analysis).

Abstract class ofevent

Explanation Example Number

After Deadline The operation is not allowed, because itis performed after deadline.

The admission forthe chosen major isclosed.

12

Algorithm Run-time Error

Problems with the execution of an ad-mission algorithm, which is defined by auser.

Unable to executethe algorithm.

9

Alternative Re-sponse

Events describing the alternative re-sponse of an actor (usually, a negativeversion of the step in the main scenario)

The committee rejectthe application.

89

Connection Be-tween Actors

Problems with communication betweenthe system and external actors, i.e. ac-tors representing other systems or de-vices. The system is able to respond tosuch an event.

Payment systemdoes not respond.The system canpropose to use adifferent method ofpayment.

65

Lack of Confir-mation

A user does not confirm his/her action. Administrator doesnot confirm remov-ing the post.

12

Missing Data Events concerning missing input data. Candidate did notprovide personaldata

481

No Data Defined System does not have requested data. There are no majorsavailable

259

Payment events Events identified in a use case “Assignan application fee to a major.” They re-late either to credit card payment prob-lems (expired card or insufficient re-sources) or choice of an alternative pay-ment method.

Credit card has ex-pired.

18

Redo Forbidden Actor would like to redo some operationthat can be performed only once (e.g.,registering for the second time).

Candidate triesto register for thesecond time.

103

Wrong Data orIncorrect Data

These are events related to the wrong (inthe sense of content) or incorrect (in thesense of syntactic format) input data.

Provided data is in-correct.

502

speed was measured as the number of steps in the use cases that were analyzed,

divided by the time required to perform the analysis (see Equation 6.1).

Speed(p) = #steps(p)

T[steps/mi nute] (6.1)

Where:

• p is a participant in an experiment;

• #steps(p) is the number of steps reviewed by the participant p;

• T is the total time spent on the review (constant: 60 minutes).

The accuracy was measured as a quotient between the number of events iden-

tified by a participant compared with the total number of distinct events identified


by the participants of both experiments in the steps reviewed by a given participant

(see Equation 7.5).

Accur ac y(p) =∑di st ance(p)

s=1 #event s(p, s)∑di st ance(p)s=1 #Event s(s)

(6.2)

Where:

• p is a participant in an experiment;

• di st ance(p) is the furthest step in the specification that was analyzed by the

participant p;

• #event s(p, s) is the number of distinct events identified by the participant p

based on the step s;

• #Event s(s) is the number of distinct events identified by all participants in

both experiments based on the step s.

Participants The participants in the experiment were 18 4th and 5th-year students

of the Software Engineering programme at Poznan University of Technology. The

students were familiar with the concept of use cases. They had authored use cases

for previous courses in the SE programme curriculum, as well as in the projects they

had participated in.

The participants were randomly assigned to two groups: group G1 was asked to

use the ad hoc method and group G2 to use the H4U method.

Object — Software Requirements Specification The object of the experiment was

a use-case-based software requirements specification (SRS). To mitigate the risk of

using a specific requirements specification (i.e., too easy, or too difficult to analyze),

it was decided to use the benchmark requirements specification2 ([14, 13]). This is

an instance of a typical use-case-based SRS, which was derived from the analysis of

524 use cases coming from 16 software development projects.

For the purpose of the experiment the original benchmark specification was purged

by removing all the events (only bare scenarios remained). This consisted solely of

the use-case headers and main scenarios.

Operation of the experiment

Prepared instrumentation The following instrumentation was provided for the par-

ticipants of the experiment:

• A presentation containing around 30 slides describing the basics of use cases.

The G1 group was presented with the information about the ad hoc approach

2The benchmark requirements specification is available on-line at http://www.ucdb.cs.put.poznan.pl.

http://www.ucdb.cs.put.poznan.pl



to events identification. The information about the H4U method was pre-

sented only to the G2 group. At the end of the presentation, both groups were

informed about the details of the experiment.

• A benchmark specification being the object of the study.

• An editable form where the participants filled in the identified events.

• A questionnaire, prepared in order to identify possible factors which might

influence the results of the experiment.

All the materials were provided in English, as all the classes of the Software Engi-

neering programme are conducted in English.

Execution The procedure of the experiment consisted of the following steps:

1. The supervisor of the experiment gave the Presentation (15 minutes).

2. The participants read the Benchmark Specification and noted all the identified

events.

3. After every 10 minutes the participants were asked to save the file with the

events under a different name, and to change the color of the font they were

using. This allowed the experimenters to analyze how the set of identified

events was changing over in time. The students were given 60 minutes to fin-

ish their task.

4. After the time was finished, the students were asked to fill out the question-

naire.

Data validation After collection of the data, all questionnaires were reviewed. None

were rejected due to incompleteness or errors.

Analysis and interpretation

Descriptive statistics Before experimenters proceeded to further analysis, the ex-

perimental data was investigated to find and handle outlying observations. After

analyzing the descriptive statistics presented in Table 6.3, we did not observe any

potentially outlying observations.

The next step in the analysis was the investigation of the samples distributions

in order to choose proper statistical hypothesis tests.

The initial observation, based on an analysis of the Q–Q plots, was that the as-

sumption about samples normality might be violated for the speed variable. This

suspicion was further confirmed by the Shapiro–Wilk test.3 Therefore, we decided

to use non-parametric statistical tests for speed variable and parametric tests for the

accuracy variable.

3The results of the Shapiro–Wilk test (α = 0.05) — Speed G1: W = 0.804, p-value = 0.023 (rejectH0), G2: W = 0.860, p-value = 0.096 (not reject H0); Accuracy G1: W = 0.930, p-value = 0.477 (notreject H0), G2: W = 0.974, p-value = 0.925 (not reject H0).

Table 6.3: Descriptive statistics (groups G1 and G2).

Speed[steps/min] Accuracy

G1 G2 G1 G2

#1 2.63 0.35 0.25 0.36#2 1.83 0.12 0.22 0.44#3 2.58 0.40 0.18 0.26#4 2.65 0.50 0.09 0.22#5 1.40 0.45 0.13 0.26#6 1.95 0.33 0.09 0.34#7 2.62 1.35 0.25 0.31#8 2.60 1.17 0.14 0.05#9 1.85 1.07 0.19 0.12

N 9 9 9 9Min 1.40 0.12 0.09 0.05

1st Qu. 1.85 0.35 0.13 0.22Median 2.58 0.45 0.18 0.26Mean 2.23 0.64 0.17 0.26

3rd Qu. 2.62 1.07 0.22 0.34Max 2.65 1.35 0.25 0.44

Hypotheses testing In order to compare the review speed of the H4U method to

the typical ad hoc approach, the following hypotheses were formulated. The null

hypothesis stated that there is no difference regarding the review speed of the H4U

and ad hoc methods:

Ha0 : θSpeed(G1) = θSpeed(G2) (6.3)

The alternative hypothesis stated that the median review speed differs between

these two methods:

Ha1 : θSpeed(G1) 6= θSpeed(G2) (6.4)

If accuracy is considered, it is important to investigate whether H4U improves

the accuracy of events identification in use cases. Therefore, the following hypothe-

ses were defined:

Hb0 :µAccuracy(G1) =µAccuracy(G2) (6.5)

Hb1 :µAccuracy(G1) <µAccuracy(G2) (6.6)

To test the hypotheses the Wilcoxon rank-sum test was used to test hypotheses

regarding speed and the t-test to test hypotheses concerning accuracy; both with

the significance level α set to 0.05.

As the result of the testing procedure, both null hypotheses were rejected (p-

values were equal to 4.114e-05 and 3.214e-02).


Power analysis The observed normalized effect size4 expressed as Cohen’s d coef-

ficient ([32]) was “large”5 (Cohen’s d for review speed was equal to 3.49; and for

accuracy it was equal to 0.96).

Sensitivity analysis, which computes the required effect size with the assumed

significance level α, statistical power (1-β), and sample, revealed that in order to

achieve a high power for the performed tests (0.95) the effect size should be equal

to at least 1.62 (one-tailed t-test) and 1.97 (two-tailed Wilcoxon rank-sum test).

The observed effect size for speed was visibly greater than 1.97, while the one

observed for accuracy was lower than 1.62 (the calculated post hoc power was equal

to 0.62).

Interpretation Based on the results of the experiment, one could conclude that the

H4U method can help to achieve higher accuracy of identification of events in use-

case scenarios when used by junior software engineers.

Unfortunately, the higher accuracy of the H4U method was achieved by visibly

lowering the speed of the review process (the cumulative number of steps analyzed

by both groups in time are presented in Figure 6.9).

Threats to validity

The following threats to validity need to be taken into account when interpreting the

results of the experiment.

First, it needs to be noted that only 18 (9 for each group) participants took part

in the experiment, which constitutes a small sample.

Second, the participants were students; hence, it might be impossible to gener-

alize the results to the group of professional analysts. However, most of the partici-

pants had some practical experience in software houses, so the results might be gen-

eralized to junior developers. It is also noteworthy that the participants who used

the H4U method were not familiar with this method prior to the experiment; hence,

they were learning this approach during the experiment. This factor could possibly

have a negative influence on the review speed of the group using the H4U method.

An additional threat to conclusion validity is posed by the disproportion be-

tween the number of steps analyzed by participants using different methods. The

higher speed of ad hoc groups enabled them to analyze a greater number of steps

than participants using H4U. As a result the number of participants using H4U and

4Please note that the retrospectively calculated effect size only approximates the real effect size inthe populations from which the samples were drawn. More information can be found in [127].

5According to [32] effect size is perceived as “small” if the value of d is equal to 0.2, as “medium”if the value of d is equal to 0.5, and as “large” if d is equal to 0.8.


●

●

10 20 30 40 50 60

050

100

150

200 H4U Univ: processed steps cumulative

Time [min]

Ste

ps

10 20 30 40 50 60

050

100

150

200 Ad hoc Univ: processed steps cumulative

Time [min]

Ste

ps

●●

●●

●●

●●

●

10 20 30 40 50 60

050

100

150

200 H4U Ind: processed steps cumulative

Time [min]

Ste

ps

●

10 20 30 40 50 60

050

100

150

200 Ad hoc Ind: processed steps cumulative

Time [min]

Ste

ps

Figure 6.9: The cumulative number of steps processed by reviewers in time (Univ — Experiment1; Ind — Experiment 2).

contributing proposals of events was decreasing in consecutive steps. As presented

in Figure 6.10, this could led to an increase in the accuracy of ad hoc groups.

1.0

1.5

2.0

2.5

Experiments limited to the number of steps

Mea

n ac

cura

cy H

4U/A

dHoc

● ●● ●

●●●●●●

●●● ●●● ●● ●●●●● ●●● ●●●●● ●●●●●●●●●●●●

●●●● ● ●●● ●● ●●● ●●●● ●●●●●●●●●●●●●●●● ● ●●●●

20 26 32 38 44 50 56 62 68 74 80 86 92 98 104 111 118 125 132 139 146 153

●

Legend

Ind Univ

Figure 6.10: The ratio between the mean accuracy of H4U and Ad hoc if the scope of experimentswas limited to a given number of steps.


6.4.3 Experiment 2: Experienced reviewers

Experiment design

Participants The participants in the second experiment were 82 professionals, with

experience ranging from 1 to 20 years (mean 6 years, standard deviation 4 years).

The group was heterogeneous with respect to roles performed by the participants

in projects (software developers & designers, architects, analysts, and project man-

agers). Most of the participants had experience with use cases: 32% had used them

in many projects, 24% had used them in at least one project, 22% mentioned only a

couple of use cases. There were also participants who did not have any experience

related to use-case-based requirements: 14% had heard about use cases but had not

used them, 8% had never heard about use cases.

The participants were randomly assigned to two groups: group G3 that was asked

to use the ad hoc method and group G4 used the H4U method.

Operation of the experiment

Prepared instrumentation The materials were translated into Polish, because we

could not assume that all participants are fluent in English. In addition, an addi-

tional short survey was prepared to collect demographic information as well as to

obtain opinions about the H4U method from the professionals.

Data validation After collection of the data, all questionnaires were reviewed. We

decided to reject 18 observations because of their incompleteness. Unfortunately,

some of the participants were interrupted by their duties and had to leave before

the end of the experiment. Some decided to identify events and write alternative

scenarios to practice the whole use-case development process, which resulted in

less time spent on the experiment task of solely identifying events.

Analysis and interpretation

Descriptive statistics The experiment data was investigated to find and handle out-

lying observations. After analyzing the descriptive statistics presented in Table 6.4,

we found 1 potentially outlying observation in group G3 with respect to review speed.

However, after deeper investigation of the work, we did not find any justification for

rejecting it (the ad hoc process does not impose any specific approach to events

identification, so the review speed can differ visibly).

In the next step of the analysis samples distributions were investigated in order

to choose proper statistical hypothesis tests.


Table 6.4: Descriptive statistics (groups G3 and G4).

Speed[steps/min] Accuracy

G3 G4 G3 G4

#1 2.18 2.32 0.19 0.13#2 2.07 0.85 0.04 0.36#3 2.20 0.38 0.13 0.45#4 2.12 2.02 0.13 0.05#5 0.68 1.97 0.14 0.03#6 2.17 0.78 0.16 0.23#7 1.90 0.68 0.11 0.09#8 0.87 0.38 0.21 0.08#9 2.23 1.07 0.09 0.10

#10 2.30 1.35 0.24 0.28#11 2.07 2.07 0.08 0.11#12 0.68 0.48 0.22 0.23#13 2.22 1.07 0.09 0.07#14 2.07 0.48 0.15 0.33#15 2.02 0.97 0.25 0.19#16 2.20 0.38 0.28 0.14#17 2.18 0.42 0.07 0.23#18 2.02 0.48 0.14 0.09#19 2.17 0.68 0.22 0.35#20 1.90 0.55 0.15 0.18#21 1.07 0.78 0.03 0.26#22 2.02 0.78 0.07 0.17#23 2.18 0.78 0.21 0.36#24 1.97 1.90 0.18 0.14#25 1.60 1.48 0.12 0.08#26 1.43 1.83 0.11 0.07#27 2.02 1.17 0.05 0.31#28 0.42 0.48 0.15 0.26#29 0.55 1.07 0.20 0.14#30 0.87 0.68 0.26 0.15#31 2.08 2.22 0.11 0.23#32 2.17 0.85 0.12 0.30

N 32 32 32 32Min 0.42 0.38 0.03 0.03

1st Qu. 1.56 0.53 0.10 0.10Median 2.05 0.82 0.14 0.17Mean 1.77 1.04 0.15 0.19

3rd Qu. 2.17 1.38 0.20 0.26Max 2.30 2.32 0.28 0.45

The observation made based on the analysis of the Q–Q plots was that the as-

sumption about samples normality might have been violated for the speed variable.

This suspicion was confirmed by the Shapiro–Wilk test.6 Therefore, again we de-

6The results of the Shapiro–Wilk test (α= 0.05) — Speed G3: W = 0.742, p-value = 3.953e-06 (rejectH0), G4: W = 0.868, p-value = 1.056e-03 (reject H0); Accuracy G3: W = 0.973, p-value = 0.577 ( rejectH0), G4: W = 0.951, p-value = 0.153 (not reject H0).

cided to use the same non-parametric statistical test to test the hypotheses regard-

ing speed and the same parametric statistical test to test the hypotheses concerning

accuracy.

Hypotheses testing In order to compare the review speed and accuracy of the H4U

method with the ad hoc approach, a set of hypotheses similar to those formulated

in Experiment 1 was defined.

The null hypothesis related to review speed stated that there is no difference

regarding the review speed of the H4U and ad hoc methods:

Hc0 : θSpeed(G3) = θSpeed(G4) (6.7)

The alternative hypothesis stated that the median review speed differs between

these two methods:

Hc1 : θSpeed(G3) 6= θSpeed(G4) (6.8)

In addition, the following hypotheses regarding accuracy were defined:

Hd0 :µAccuracy(G3) =µAccuracy(G4) (6.9)

Hd1 :µAccuracy(G3) <µAccuracy(G4) (6.10)

As a result of performing the testing procedure, the null hypothesis Hc0 regard-

ing review speed was rejected (p-value was equal to 5.055e-05). The second null

hypothesis Hd0 concerning accuracy was also rejected (p-value was equal to 2.477e-

02).

Power analysis The observed normalized effect size expressed as Cohen’s d coeffi-

cient was “large” for review speed (Cohen’s d was equal to 1.21) and “medium” for

accuracy (Cohen’s d was equal to 0.50).

The sensitivity analysis revealed that in order to achieve a high power for the

performed tests (0.95), the effect size should be at least equal to 0.83 (one-tailed

t-test) and 0.99 (two-tailed Wilcoxon rank-sum test).

The effect size observed for the accuracy was smaller than 0.83. The calculated

post-hoc power of the test was equal to 0.63. The effect size observed for the speed

was higher than 0.99.

Interpretation Based on the results of the experiment, one could conclude that the

H4U method can help to achieve higher accuracy of identification of events in use-

case scenario when used by software engineers (not necessarily analysts).

Unfortunately, again the higher accuracy of the H4U method was achieved by

lowering the speed of the review process. The on-average review speed was lowered


by 1.7-2.5 times in the case of the H4U method than for ad hoc (the cumulative

number of steps analyzed by both groups in time are presented in Figure 6.9).

In both experiments, we have observed that the students outperformed the pro-

fessionals in terms of accuracy. This observation was unexpected. We suspect that

it is caused by the fact that the students were more up to date with the topic of use

cases — they were learning about use cases and practicing during the courses.

Another observation made during both experiments is that the accuracy of events

identification is generally low if one considers individual reviewers. The observed

mean accuracy ranged from 0.15 (for professionals using the ad hoc method) to 0.26

(for students using H4U). This means that a single reviewer was “on-average” able to

detect only a quarter of all the events. Therefore, there is a need for a better method

of supporting identification of events.

When it comes to the review speed of the approaches, a non-structured ad hoc

approach seems visibly more efficient. However, in the survey conducted after the

experiment, more participants using the ad hoc approach had the feeling of solving

tasks in a hurry (65%) than did users of the H4U method (56%).

Although the H4U method forces users to follow a specific procedure, most of

the participants did not perceive it as difficult (∼68% “definitely no” and “no”; ∼19%

“difficult to say”; ∼13% “definitely yes” and “yes”).

In addition, more than half of the participants stated that they would like to use

the method in their next projects (∼53% “definitely yes” and “yes”; ∼34% “difficult

to say”; ∼13% “definitely no” and “no”)

Threats to validity

Again, there are some threats to the validity of the study that need to be taken into

consideration.

First, there is a threat related to experimental mortality. Among 82 participants

18 did not complete the experiment task, because they were disturbed by their pro-

fessional duties.

The threat to conclusion validity regarding the disproportion between the num-

ber of steps analyzed by both groups seems less serious than in Experiment 1 (see

Figure 6.10).

There is also a threat related to external validity. Although the experiment in-

volved professionals, only some of them were fulfilling the role of analyst in software

development projects.

It also needs to be noted again that the participants who used the H4U method

were not familiar with this method prior to the experiment; hence, they were learn-

6.5. Usage of HAZOP keywords 74

ing this approach during the experiment. This factor could possibly have a negative

influence on the review speed of the group using the H4U method.

6.5 Usage of HAZOP keywords

The proposed H4U method includes all of the 11 secondary keywords as suggested

by [102]. However, the analysis of the results of the conducted experiments leads to

the conclusion that not all of the keywords are used equally. The identified events in

both experiments were divided into three sets. The first set contained events which

could be detected by the system (according to the Detectable Conditions pattern

— [9]) and which were proper use-case events, the second set consisted of events

which could be detected by the system, but were not correct use-case events (e.g.

they were non-functional requirements), while the third set contained events which

could not be detected by system. The first set is the most important one, as it con-

tains the events which should be included in the use-case based requirements spec-

ification. Therefore, the keywords which helped identify the events from the first

set might be considered more useful. During the experiments, the participants who

used the H4U method were asked to choose which keyword helped them to iden-

tify the event (according to the H4U report table as presented in Figure 6.5). We

counted how many events were identified with usage of a given keyword. As pre-

sented in Table 6.5, some of the keywords were used very often (e.g., NO - about

48%) while other keywords were used very rarely (e.g., AS WELL AS - less than 1%)

and their contribution to the identification of detectable events was very small. The

presence of these very rarely used keywords might have made the analysis more te-

dious and could negatively influence the results. It appears that some of the key-

words might be merged, e.g. MORE and AS WELL AS; some of the participants were

using these keywords interchangeably. The pairs EARLY - BEFORE and LATE - AF-

TER were similar. Although the meaning of the keywords was different they led to

the discovery of very similar events. Merging these pairs would decrease the num-

ber of keywords, which could shorten the time needed for events identification with

the H4U method. Moreover, as noted in Section 4, the median review speed of the

H4U method was not higher than 0.82, so there is a lot of room for improvement.

On the other hand, the introduction of additional, more specific keywords might

help to increase the number of identified events. However, it should be noted that

the use of keywords might have been different if requirements specifications from

different domains were used, hence, for every business domain the presence of par-

ticular keywords should be carefully analysed. In case of a business domain which

has never been analysed with the H4U method and there is no experience in using

6.6. Related Work 75

the keywords, use the standard set of keywords is advised.

Furthermore, it is possible that the order of the keywords (they were presented

in the following order: NO, MORE, LESS, AS WELL AS, PART OF, REVERSE, OTHER

THAN, EARLY, LATE, AFTER, BEFORE) might influence how they were used by the

participants. There is a need for further research to see how the set of the secondary

keywords could be altered to increase the accuracy and review speed of the H4U

method.

Table 6.5: Percentage usage of HAZOP secondary keywords in both experiments (used in order toidentify: detectable use-case events, detectable non-use-case events and undetectable events).

Detectable Detectable Undetectableuse-case events non-use-case events events

NO 47.5 44.4 38.5MORE 5.2 4.2 10.0LESS 12.1 3.0 12.0

AS WELL AS 0.4 0.8 1.7PART OF 7.7 3.4 10.7REVERSE 6.0 11.0 3.8

OTHER THAN 5.8 5.3 10.7EARLY 2.7 4.2 2.3LATE 1.9 7.2 4.0

BEFORE 8.5 7.8 4.6AFTER 2.0 9.1 1.8

6.6 Related Work

Many authors have noted the problem of requirements specification completeness

with the focus on events associated with use cases. [120] included the following

question in his inspection checklist for use cases: Are all known exception conditions

documented?. [83] identified 5 characteristics concerning requirements specifica-

tion completeness; one of them reads: all scenarios and states are recognized. [35]

suggested 3 types of requirement defects, from the least severe, which have minimal

impact, to the most severe. To deal with the last type of defects, Cox et al. included

the following question in their use-case inspection guidelines: Alternatives and ex-

ceptions should make sense and should be complete. Are they? In order to solve the

issue of incomplete exceptions Carson proposed a 3-step approach for requirements

elicitation ([28]). The goal of the third step in this approach is to verify requirements

completeness, based on the idea of a complementary set of conditions under which

the requirements are to be performed ([27]).

Although the processes and approaches mentioned above, like many others ([18]

6.7. Conclusions 76

[22]), seem to emphasize the importance of event-completeness, they do not pro-

pose any specific method or technique aimed at identifying events. Our proposal is

to adopt the HAZOP approach for this purpose.

HAZOP has already been used in many software development contexts. [102]

described how HAZOP could be used for the analysis of safety issues in software

systems. [63] proposed UML-HAZOP, a method for discovering anomalies in UML

diagrams. He is using HAZOP to obtain a checklist against which the quality of UML

diagrams can be assessed.

None of the above authors has applied HAZOP to use cases. [17] were first to

apply HAZOP to scenarios and use cases. Their aim was to identify hazards in safety

critical systems. In their approach they analyzed pre-, post- and guard-conditions

together with system responses. The output of this approach is the possible hazards

and failures of the system. Another example of applying HAZOP for analysis of use

cases is the work by [111]. She and her coworkers were interested in identification of

security requirements. Their approach requires complete use cases (pre-conditions,

main scenarios and extensions, including events and alternative scenarios) as an in-

put. With the help of HAZOP keywords, they determined possible security require-

ments and policies.

The two above approaches might seem similar to the proposed H4U method as

they are applying HAZOP to a use case. In fact they are different. [111], and [17]

assume more or less complete use cases as an input, and the output is a report on

possible security and safety issues. Moreover, there is a difference in the keywords

used by [111], and [17], and H4U. As described in Section 2, H4U is part of a func-

tional requirements elicitation, whereas the approaches presented by [111], and [17]

are concerned with nonfunctional requirements. Analysis of use cases has also been

studied by others ([16, 112, 113]); however, they are not using HAZOP at all.

6.7 Conclusions

Event completeness of use cases is one of the quality factors of use-case based re-

quirements specification. Missing events might lead to higher project costs ([114]).

Therefore, a method called H4U (HAZOP for Use Cases) aiming at events identifi-

cation, was proposed in the paper and evaluated experimentally. The method was

compared to the ad hoc review approach. The main contribution of this paper is

that the results of accuracy of events identification in both cases were unsatisfactory,

hovewer, usage of the HAZOP-based method lead to the achievement of significantly

better accuracy.

The results led to the following more detailed conclusions:

6.7. Conclusions 77

• The maximum average accuracy of events identification in both conducted

experiments was 0.26. This shows that events identification is not an easy task,

and many events might be omitted during the analysis, hence new methods

and tools are needed to mitigate this problem.

• The proposed HAZOP-based method can help in achieving higher accuracy of

events identification. This is true in two distinct environments: IT profession-

als (0.19 accuracy with H4U and 0.15 with the at hoc approach) and students

(0.26 accuracy with H4U and 0.17 with the at hoc approach).

• H4U enables higher accuracy of events identification, however, it requires more

time to perform a review. In both groups, IT professionals and students, the

ad hoc approach was more efficient in terms of the number of steps analyzed.

On average the ad hoc approach review speed varied from 1.77 steps analyzed

per minute by professionals to 2.23 steps analyzed per minute by junior an-

alysts, while the review speed of the H4U method was respectively 1.04 and

0.64 steps analyzed per minute.

• Students versus professionals. There is a common doubt as to whether an ex-

periment with the participation of students could provide results that remain

valid for professionals. One of the most debatable questions regards the differ-

ences between the effectiveness of (typically inexperienced) students in com-

parison to more experienced professionals. Some of the research performed

so far indicates that for less complex tasks the difference between students

and professionals is minor ([51]). In the experiments described in the paper

we observed that master-level SE students can even outperform professionals

in some cases.

Acknowledgements

We would like to thank all participants of the experiments for their active partici-

pation. This research was funded by Polish National Science Center based on the

decision DEC- 2011/01/N/ST6/06794 and by the PUT internal grant DS 91-518.

Chapter 7

Automatic identification of events

in use cases

Preface

This chapter contains the report: Jakub Jurkiewicz, Jerzy Nawrocki, Automated

events identification in use cases, Politechnika Poznanska, RA-5/201.

My contribution to this chapter included the following tasks: (1) analysis of a set

of real-life use-cases in order to build a knowledge-base about the abstract types of

events and actions occurring in use-cases; (2) elaboration of inference rules aimed

at identification of abstract types of use-case event; (3) implementation of the proto-

type tool (including inference, NLP and NLG layers); (4) evaluation of the accuracy

and speed of the proposed method; (5) conducting and evaluating the Turing-test

based experiment.

Context: The results from Chapter 6 show that identification of events in use-cases

is not an easy task. Participants of the experiments were able to identify, on aver-

age, only 18% of possible events when using the ad hoc approach and 26% of pos-

sible events when using the HAZOP-based method. Such incompleteness of events-

identification might have a negative influence on software development phases, which

follow requirements elicitation and analysis. Moreover, we observed that manual re-

view of use-cases is a time consuming process. Thus, the challenge is how to speed

up identification of events by automating this activity in such a way that the com-

pleteness of obtained sets of events will be improved.

Objective: The goal of this Chapter is to propose a method of identification of events

in use cases, evaluate its accuracy, and compare it with accuracy of events identifi-

cation performed by humans.

78


Method: To propose a method of automated identification of events, 160 use cases

from software projects have been analysed. This analysis resulted in identification

of 14 abstract types of events and two inference rules, which became a founda-

tion for the proposed method. The method was implemented as a plugin to UC

Workbench, a use-case editor. The effort required to identify events in use cases

(review speed) and completeness of events-identification (accuracy) performed by

the proposed method was evaluated with the use of a benchmark specification and

compared to the results for the ad hoc and HAZOP-based approaches. Moreover, to

evaluate the linguistic quality of the automatically identified events, an experiment

based on assumptions of the Turing-test was conducted.

Results: The conducted research resulted in the following findings:

• An automated method for identification of events was proposed. The ob-

served accuracy of this method was at the level of 0.8.

• The proposed automated method achieved higher accuracy than the man-

ual approaches: the maximum average accuracy for the ad hoc approach was

equal to 0.18 and for the HAZOP-based method it was equal to 0.26.

• The automated method, although not optimized for speed, was faster than

the manual ones. It was able to analyse, on average, 10.8 steps per minute,

while the participants of the experiments were able to analyse, on average, 2.5

steps per minute with the ad hoc approach, and 0.5 step per minute with the

HAZOP-based approach.

• The understandability of event descriptions generated by computer was not

worse than the understandability of descriptions written by humans.

Conclusion:

The proposed method of events identification is faster and more accurate than

the manual methods (the ad hoc method and the HAZOP-based one). The under-

standability of event descriptions generated by computer is not worse than the un-

derstandability of descriptions written by humans.

7.1 Introduction

Requirements highly influence different aspects of software development [67, 78, 86,

92]. One of the approaches to specifying functional requirements is the concept of

the use case. It was proposed by Ivar Jacobson [59] and further developed by Alistair

Cockburn [9, 31] and other authors [53]. Roughly speaking, a use case is a descrip-

tion of user-system interaction expressed in natural language. The focal element

of a use case is the main scenario consisting of a sequence of steps that describe


the mentioned user-system interaction. While a step is being executed, some events

might happen. Descriptions of those events along with alternative scenarios are also

included in a use case.

The question of how to identify a (more or less) complete list of possible events

arises. Research shows that non-identified events (some call them “missing logic

and condition tests for error handling” [114]) are the most common cause of changes

in requirements which result in requirements volatility and creeping scope. The im-

portance of event-completeness has been recognized by many authors (although

they do not use this specific term), including Carson [28], Cox [35], Mar [83] and

Wiegers [120]. The approach they recommend is based on identification of events

performed by experts using brain storming or other “manual” techniques. There is

also a method called H4U [65]. It is a form of a HAZOP-based review, which aims

at identifying events in use cases. The main disadvantage of manual identification

of events is the time and effort required by such a process. Thus, a question arises

whether it is possible to identify events automatically and obtain good quality event

descriptions in the sense of event-completeness and their readability.

The above question is also interesting from the artificial intelligence point of

view. The following saying is attributed to Pablo Picasso: Computers are useless. They

can only give you answers [2]. Identifying of events that might happen during exe-

cution of a use-case step resembles a dialog in which a human is saying a simple

sentence (e.g. This evening I am going to the opera house. — that corresponds to a

use-case step), and a computer is asking a question (e.g. What if all the tickets are

sold? — that question includes the event all the tickets are sold). To ask such event-

related questions requires the understanding of a sentence, i.e. some knowledge is

necessary (in this example it is knowledge about opera and tickets), and the ability

to infer about events with use of that knowledge.

In this paper a method of automatic identification of events in use cases is pre-

sented. First, in Section 7.2, the model of use-case based requirements specification

is outlined. The central part of the method is the inference engine that was used

for the generation of event names. It is described in Section 7.3. The mechanism

for analysing a use-case step with a Natural Language Processing (NLP) tool is pre-

sented in Section 7.4. The events are generated using Natural Language Generation

(NLG) techniques — see Section 7.5. Empirical evaluation of the proposed approach

is discussed in Section 7.6. Related work is presented in Section 7.7.

7.2. Use-case based requirements specification 81

ID: UC1 Name: Edit a post Actor: Author Main scenario: 1. Author chooses a post. 2. Author chooses the edit option. 3. System prompts for entering the changes. 4. Author enters the changes to the post. 5. Author confirms the changes. 6. System stores the changes.Extensions:

4.A. Post with the entered title already exists. 2.A.1. System displays an error message

1

2

3

4

5

Figure 7.1: Example of a textual use case. Main elements: 1 - name, 2 - main scenario section, 3- extensions section, 4 - an event, 5 - an alternative scenario

7.2 Use-case based requirements specification

In this paper it is assumed that a functional requirements specification document

contains the following elements:

• Actors - descriptions of actors interacting with the system being built.

• Information Objects - presentation of domain objects the actors and the sys-

tem operate on. Every information object can have a set of different properties

assigned to it (the possible properties are described in Section 7.3).

• Use Cases - descriptions of user-system interactions in the form of scenarios

performed in order to achieve user’s goals. Every use case consists of a name

describing a user’s goal and a main scenario consisting of a sequence of steps.

Steps in the main scenario may be interrupted by events described within ex-

tensions. Additionally, alternative scenarios describe how given events should

be handled. An exemplary use case is presented in Figure 7.1.

Use cases have been used in both research and industry since their introduction

in 1985 [59]. Research shows that scenario-based notations (including use cases) are

the most popular way of describing customer requirements [92]. Two main factors

are responsible for their popularity: firstly, the clear focus on describing the goals

of the system; secondly, natural language which the requirements are expressed in.

The letter makes it easier for the customer and prospective end-users to understand

the requirements document, however, it makes it hard to process the specification

in an automatic way.

7.3. Inference engine for identification of events 82

7.3 Inference engine for identification of events

7.3.1 The problem

It is assumed that a functional requirements specification on the input contains de-

scriptions of actors, descriptions of information objects, and use case stubs, i.e. use

cases without their extension parts (see Fig. 7.1). To complete use case stubs with

event descriptions the following problem is discussed in this paper:

Events identification problem: Given a step from the main scenario of a use case

identify all the events that can happen when the step is executed. Only events which

can be detected by the system are of interest (that viewpoint is supported by Adolph

et al. [9] in their Detectable conditions pattern).

7.3.2 Abstract model of a use-case step

The process of event identification consists of four phases:

1. Preprocessing of the description of actors and information objects that creates

a dictionary of these elements (it resembles the creation of a symbol table by

a compiler during parsing).

2. Analysing a step using NLP techniques (a main scenario is analysed step by

step).

3. Inferring the possible events for a given step (that results in abstract descrip-

tions of events which are independent of natural language).

4. Generating event descriptions in natural language.

The overview of this process is presented in Figure 7.2. The most important part of

the presented method is inferring about the possible events.

Our approach to inference about events is based on a model of human-computer

interaction, which evolved from our study of available use cases and events associ-

ated with them. Our aim was to provide the simplest possible model which would

allow to inference about all the events we have found in the considered use cases.

The model is twofold and is illustrated in Figure 7.3.

Model of the real world (Fig. 7.3A). Some events are a consequence of a state in

which a real object exists (e.g. a store is empty). The model represents real world

objects (e.g. a book in a library, an article in a newspaper). What is interesting, for

the purpose of inferencing events, is that quite a simple tool proved to be powerful

enough: it is the set theory. A fragment of the real world we are interested in can

be viewed as individual objects (i.e. items) or collections of objects (i.e. sets). Fur-

thermore, items might be simple or they might be compound and consist of other

items. It is important to notice that some activities do not change the real world


Preprocessing

Use-case stubs, Actors and

Information Objects descriptions

Analyst

Enters

Natural Language Processing

Abstract models of use-case steps

(Actors, Activities, Information Objects)

Events Inference Engine

Event types assigned to use-case steps

Natural Language Generation

Use-case stubs + Events

Dictionary of Actors and Information

Objects

1 3

2

Figure 7.2: An overview of use-case processing and events analysis: 1 - initial preprocessing of usecase stubs, actors and information objects descriptions (using NLP tools); 2 - generation of ab-stract classes of events (events inference engine); 3 - events names generation (using NLG tools)

.

- they just provide us with information about the world (state) as-is. On the other

hand, there are some activities which change the world (i.e. its state) and they lead

to a new state (as-to-be). In our view, without awareness of the last category of activ-

ities, identifying some events would be impossible. Moreover, the state of real world

objects can change over time, e.g. a credit card can expire on some day.

Model of IT system (Fig. 7.3B). Some events are associated with human-computer

communication or computer-computer interaction (e.g. an external service is not

available). This model assumes that all the interactions between the Actor and the

System happen through user interface (UI). The Actor can perform CRUD+SV op-

erations (Create, Read, Update, Delete, Select, Validate) on data (both internal and

external to the System). In the case of operations which can be harmful (e.g. deleting

an information object), the Actor might need to confirm his operation, so Confirm


Actor System

CRUD+SV

Confirm

Display

ExternalData

InternalData

Item(compound)

Set Set

Change

Change

As-is As-to-be

A

B

Problem

UI

Item(compound)

Item(compound)

Item(simple) Item

(simple)

Item(simple) Item

(simple)

Figure 7.3: A - Model of real world, B - Model of IT system

has been added to our model. Moreover, the System may display data or may inform

the Actor about the occurrence of a problem - that is reflected in our model via the

Display operation. The behavior of the Actor and the System may depend on time,

i.e. performing some operations might take too much time and result in a timeout

or it might be too late for some operations to be completed. To some extent, the

model of IT system resembles Albrecht’s model [11] on which the function points

method [12] is based. Unfortunately, Albrecht’s model cannot be directly used for

inference about events. For instance, some events are strongly associated with the

delete operation which is not directly represented in the Albrecht’s model. There is

a similar problem with the confirm operation.

It is assumed that every step of a use case can be represented as a triplet: Actor,

Activity, Information Object [9, 109]. Those three notions, along with the notion of

Event, are described below.


Actor

Actor represents an abstract entity which performs an action in a given step. There

are two types of actors: SYSTEM - represents the system being built; USER - rep-

resents one of the end-users of the system. In the example from Figure 7.1 Author

who is an actor in steps 1, 2, 4 and 5 has USER type. System which is an actor in

steps 3 and 6 is of the SYSTEM type. Every actor of type USER needs to be declared

by Analyst as such.

Activity

Activity is a process triggered by Actor and it manipulates Information Objects (In-

formation Objects are described below). To identify events that can appear when a

step is executed (i.e. activity described by it is performed), we have decided to iden-

tify abstract types of activities. The types of activities were derived from the model

of IT system presented in Figure 7.3.

Assumption #1:

Each step of a use case can be classified as one of the activity types presented in Table

7.1.

JUSTIFICATION: Part of the proposed types of activities are directly related to the CRUD+SV operations

from the model of IT system. Most of the CRUD+SV operations are simple from the point of view of

the presented model of real world and each of them is assigned single activity type. For the sake of

simplicity we name such activity types after operations corresponding to them. Here are activity types

corresponding to simple operations: READ, UPDATE, DELETE, SELECT, VALIDATE.

The Create operation is not simple: it exhibits susceptibility to different events depending on an

object it manipulates on. We have identified three activity types that can be assigned to this operation:

• ADD - when a new Information Object is added,

• ENTER - when data concerning part of Information Object is entered,

• LINK - when a connection between Information Objects is established.

Moreover, the confirm operation from the model of IT system is represented by CONFIRM activity

type and the display operation is represented by DISPLAY activity type.

o

Information Object

Information Object represents an object the actor operates on. The use case pre-

sented in Figure 7.1 operates on a post which is an example of Information Ob-

ject. Information Object is assigned a set of properties that are important from

the point of view of event identification. Table 7.2 presents a set of properties we

have found helpful when identifying events. The properties are divided into two


Activity type Exemplary use-case steps

ADD Author adds new blog post.

ENTER Author enters title of the post.

LINK Author links the comment to the post.

READ Author browses available blog posts.

UPDATE Author changes the title of the post.

DELETE Author deletes the post.

SELECT Author selects a blog post from the available ones.

VALIDATE Author checks the blog post

CONFIRM Author confirms the changes.

DISPLAY System presents available blog posts.

Table 7.1: Identified types of activities of use-case steps.

.

sets called Activity-Free Properties (AFP) and Activity-Sensitive Properties (ASP). The

former are valid for Information Object no matter which activity is perform on this

object, e.g. Article always has the SET property, no matter what operation is per-

formed on it. These properties correspond to the elements of the models presented

in Figure 7.3: Set and Compound Items. Moreover, there might be some cardinal-

ity constraints imposed on the object, hence, two additional properties have been

added: AT_LEAST (the smallest number or elements) and NO_MORE_THAN (the

highest number of elements). Furthermore, some objects can be stored in an exter-

nal system. This kind of objects should be marked with the REMOTE property.

ASP properties depend on Information Object and an activity performed on it.

ASP properties are related to the time aspect of operations of the model of IT system.

Assume an invoice can be exported and also some other operation can be performed

on it. If only the export operation can take too long, what can lead to specific events,

then an information object Invoice will have the LONG_LASTING property only for

the export operation.

7.3.3 Events

Assumption #2:

Each event which can occur in a use case can be classified as one of the event types

presented in Table 7.3.

JUSTIFICATION: To identify event types one can use a HAZOP-like brainstorming session (see Sec.

6.3.1). As primary keywords one can choose activity types of Table 7.1. The CRUD+SV activities should

be considered in the following six variants:

{Internal data, External data} x {Simple, Compound, Set}

The secondary keywords can be the same as proposed in the HAZOP method (NO, MORE, etc. - see

Section 6.3.1). One can go through the list of primary keywords (i.e. all the variants of activity types),


Property Description ExampleActivity-Free Properties (AFP)

SETrepresents objects which form aset (they have to be unique)

Article

COMPOUNDrepresents objects which consistof other objects

Credit Card (consists of holdername, number, expiration date,security code)

AT_LEASTrepresents objects which quantitymust be at least X

Participants (number of whichmust at least 10)

NO_MORE_THANrepresents objects which quantitymust be no more than X

Participants (number of whichcannot be larger than 20)

REMOTE represents external objects

Credit Card (in order to processinformation concerning CreditCard System needs to contact anexternal payment service)

Activity-Sensitive Properties (ASP)

LONG_LASTING for activity Xgiven activity X for this kindof objects can take a significantamount of time

Invoice (export of which may takesignificant amount of time)

TIMEOUTABLE for activity Xgiven activity X for this kind ofobjects is not possible after somedeadline

Admission (there might be adeadline for admissions to theuniversity)

Table 7.2: Identified properties of information objects together with their descriptions and exam-ples.

one by one, and for each one consider all the secondary keywords to identify possible event types. A

list of identified event types is maintained through the brainstorming session. After the brainstorming,

that list is analyzed, item by item, to:

• remove repeated or undetectable event types,

• clarify their descriptions,

• find examples of a proposed event type in the available real-life use cases.

The results of such a process are presented in Table 7.3. What is interesting, four HAZOP secondary

keywords (AS WELL AS, PART OF, BEFORE, AFTER) did not contribute to any of the identified event

types (they seem useless in this process).

o

The process of identification of events in use-case steps is based on inference

rules formulated after the analysis of real-life use cases. The rules are expressed in a

form similar to the propositional logic, i.e.:

pr emi seconcl usi on

By analysing available use cases we have discovered two inference rules called sim-

ple and compound. In order to keep the rules brief and clear the following symbols

are used: R for role (i.e. actor type), A for activity type, O for information object, P

for property of information object, E for event type.

Simple inference rule of events identification

The following notions are used by the simple reference rule:

Type of event Description Examples of real life eventsWrongData provision of wrong/invalid data ISBN number is invalid

TooLittleLeftthere is not enough of informa-tion object

Bank account is empty

Incomplete not all data has been providedISBN of a book has not been pro-vided

AlreadyExist given data already existBook with the given ISBN alreadyexists

ConnectionProblemproblem with the connection witha remote system

Could not connect to the creditcard operator

TooMuchSelected too much data has been provided Too many subjects selected

NoObjectSelectedno data has been selected fromthe given options

No subject selected

AlreadySelectedgiven data has already been se-lected

The subject has already been se-lected

TooLittleSelected too little data has been provided Too few subjects selected1DoesNotExist given data does not exist Given article does not existLackOfConfirmation operation has not been confirmed Author does not confirm

NoDataDefinedwhen there is no information ob-ject of a given type defined

There are not subjects defined

WantsToInterruptwhen actor wants to interrupt anoperation

Author cancels export operation

TooLateForDoingThiswhen it is too late to performgiven operation

Deadline for adding new subjectshas passed

WantsToWithdrawwhen given operation is with-drawn

NOT FOUND in the analysed UCs

NoDataProvided when no data was provided NOT FOUND in the analysed UCs

LateConfirmationwhen confirmation wa performedtoo late


TooEarlyForDoingThiswhen it is to early to performgiven operation


TooMuchDataProvidedwhen too much data was pro-vided


Table 7.3: Identified event types together with their descriptions and examples.

• Observed is a finite set of axioms. Each axiom has the form of a quadruple

< R, A,P,E > and it denotes an observation that an activity of type A performed

by an actor of type R on an information object with property P can lead to an

occurrence of event of type E. The observed set of axioms we have identified

so far is presented in Table 7.4. Obviously, this set can be easily extended by

adding new axioms resulting from further research.

• AFP (stands for Activity-Free Property) is a function. Its domain is a finite set

of information objects described in a requirements specification document

(see Section 7.2). This function returns a finite set of properties of an infor-

mation object.

It is assumed that every use-case step, originally written in natural language, can be

represented by the following triplet: actor type, activity type and information object.

It means that each step is a simple sentence in the form of Subject-Verb-Object. This

assumption is supported e.g. by Cockburn’s guideline [31] which says that every step

structure should be simple and consist of subject, verb and object. Therefore, a use-

case step is represented as S =< R, A,O >. Transformation of an original step to its

abstract representation < R, A,O > is done with the help of NLP tools (as described

in Section 7.4).

The simple inference rule assumes that properties of information objects de-

pend solely on information object, not on an activity. As events can appear but do

not have to, the rule uses the ♦ symbol from modal logic to denote this. The rule is

presented below:

S =< R, A,O >,P ∈ AF P (O),< R, A,P,E >∈Obser ved

S |=♦E(7.1)

An example of applying this rule is presented in Figure 7.4.

Compound inference rule of use-case events

Unfortunately, in some cases the simple inference rule is too weak. Some events

depend not only on an information object but also on an activity performed on it. To

cope with this, the compound inference rule has been introduced. This rule uses the

notion of activity-sensitive properties (ASP for short). It is a function whose domain

is a finite set of information objects (as in the case of AFP). The function returns a

finite set of pairs < P, A >, where P denotes a property of information object and A

denotes an activity type that has to take place in order for property P to exhibit itself.

The compound inference rule is presented below:

S =< R, A,O >,< P, A >∈ ASP (O),< R, A,P,E >∈Obser ved

S |=♦E(7.2)

To show how the above rule can be used let us extend the example from Figure 7.4.

Assume that the students have to submit their posts by a certain deadline. In this

case, knowing this, Analyst should classify post as an information object which is

TIMEOUTABLE for the write in activity (write in corresponds to the ENTER activity

type). As a result of parsing the declaration of information objects, the ASP function

is augmented with the following item:

< Post ,E N T ER,T I MEOU T ABLE > .

When it comes to parsing the step: Student writes in a post it is translated to the

following triplet

.

Actor (R) Activity (A) Information Object Property (P) Event type (E)USER ENTER SET WrongDataUSER ENTER COMPOUND IncompleteUSER ENTER SET AlreadyExistsUSER ENTER REMOTE ConnectionProblemUSER ENTER AT_LEAST TooLittleLeft

SYSTEM DISPLAY SET NoDataDefinedUSER SELECT SET NoDataDefinedUSER SELECT SET AlreadySelectedUSER SELECT NO_MORE_THAN TooMuchSelectedUSER SELECT * NoObjectSelectedUSER DELETE * DoesNotExistUSER READ * DoesNotExistUSER LINK * IncompleteUSER SELECT AT_LEAST TooLittleSelectedUSER CONFIRM * LackOfConfirmation

SYSTEM

ADDENTERLINKREADUPDATEDELETESELECTVALIDATECONFIRMDISPLAY

LONG_LASTING WantsToInterrupt

USER

ADDENTERLINKREADUPDATEDELETESELECTVALIDATECONFIRMDISPLAY

TIMEOUTABLE TooLateForDoingThis

Table 7.4: Elements of the Observed set; the star means that the row is valid for any value of theproperties of the information object.

.

Since the Observed set contains the quadruple



and ASP(Post) contains

< T I MEOU T ABLE ,E N T ER >

the following conclusion in inferred: the event TooLateForDoingThis can happen

when executing the step.

7.4. Natural Language Processing of Use Cases 91

1 …. 2. System prompts for entering a post. 3. Student writes in a post. 4. Student confirms the post. 5. ...



Natural Language Processing

R = USERA = ENTER

O = Post (with COMPOUND property)

Inference engine

Incomplete

StudentTeacher

Declarations of Actors

Exam

Post+COMPOUND

Declarations of Information Objects

Use cases

Observed set

Identified type of event:

Figure 7.4: An example of inferring events for step Student writes in a post.

7.4 Natural Language Processing of Use Cases

The presented inference engine works with the elements of abstract model of use-

case steps: actor type, activity type and information object. Before the engine can

be used every use-case needs to be analysed with NLP tools and the elements of the

mentioned model have to be identified.

7.4.1 Preprocessing

UC Workbench [88] is a tool used for use-case management. With this tool Analyst

can enter and manage his/her use case-based requirements specification. The en-

tered descriptions of actors, and information objects are automatically transformed

into a dictionary of actors and information objects.

7.4. Natural Language Processing of Use Cases 92

Sentence Splitter

Tokenizer

Use-case step

Sentences

Words

Grammatical relationsanalyzer

Lemmatizer

Grammatical Relations

Parts of speechanalyzer

Lemmas

Parts of speech

Actors' names Information objects'names

Model element reference marker

Actors references

Information objects

references

1

2

3

4

Activities lookup tool

Activitiessynonyms

Activity types

Figure 7.5: The chain of the used NLP tools: 1 - input provided by the analyst in a natural lan-guage (italic font), 2 - tool, 3 - output from one tool and input for another tool, 4 - output fromthe chain (bold font).

7.4.2 NLP

Additional plugins for UC Workbench performing natural language analysis were de-

veloped in order to analyse the entered use cases. These plugins are organized ac-

cording to the Chain of Responsibility design pattern [45]. It means that the text of a

use-case step goes through all of the tools defined in the chain. Moreover, an output

of one tool can be used as an input by another tool. Tools providing the following

operations are used in order to analyse a use-case step: sentence segmentation [24],

word tagging [4], part-of-speech tagging [3], lemmatization [64], dependency rela-

tions tagging [37]. The chain of the used NLP tools is presented in Figure 7.5. After

the NLP processing, each step is annotated with the elements of the step abstract

model: actor type, activity type, information object.

The presented NLP layer is language independent, meaning that this approach

could be used for any natural language providing that adequate tools are available

for the given language. In our research use cases were written in English and they

were analysed with the tools contained in the Stanford Parser kit [37] and Open NLP

project [1]. However, any other tools realising the above mentioned NLP operations

could be used, e.g. French parsers [48] and German [101] ones .

7.5. Generation of Event Descriptions 93

7.5 Generation of Event Descriptions

The inference engine described in Section 7.3 generates abstract types of events (de-

picted in Table 7.3). Those event types (e.g. TooLittleLeft) are like row material and

need further processing. Analyst needs to obtain a description of event which is ex-

pressed in natural language and which is easily understood by the potential readers.

Therefore, Natural Language Generation (NLG) was used as the final stage of events

identification.

For every abstract event type a template for generation of event description has

been proposed. The templates together with examples are presented in Table 7.5. A

template consists of constant string literals and items from the use-case step (avail-

able after NLP phase): subject, verb, object. Moreover, additional phrase features

are included in the templates.

For the task of NLG, a Java library called SimpleNLG [46], which provides a data-

to-text mechanism, was used. SimpleNLG offers an API which allows for the set-

ting of semantic input for a phrase and direct control over the realisation process of

phrase generation (by setting various values of phrase features, e.g. tense, voice).

Based on this, a prototype tool combining all three stages of event generation

(NLP analysis, inference engine, NLG) was implemented. The tool was built as an

additional plugin for UC Workbench. Section 7.6 presents empirical evaluation of

the accuracy of the prototype tool and the understandability of descriptions of the

identified events.

7.6 Empirical evaluation

As mentioned in Section 7.5, a prototype tool combining NLP, inference engine, and

NLG was implemented as part of the UC Workbench project. This prototype has

been evaluated from the three perspectives: accuracy of events identification (to-

gether with precision), speed of analysis of use case steps, and understandability of

the generated event descriptions.

7.6.1 Reference set of events

For the evaluation of the prototype we have used experimental data concerning ac-

curacy of events identification performed by humans [65]. Two experiments were

conducted, one with 18 students of Software Engineering programme at Poznan

University of Technology, second with 64 IT professionals. The designs of both ex-

periments were the same. The aim of the participants of both experiments was to

identify as many events as possible for the main scenarios from the benchmark use-

7.6. Empirical evaluation 94

Table 7.5: Information used for generation of descriptions of events. A template for every eventtype is presented, together with the exemplary data used for the generation of event description.

artic

leTo

oLat

eFor

Doi

ngTh

isen

ter

Aut

hor e

nter

s an

artic

leA

utho

rTo

o la

te fo

r thi

s ope

ratio

nTo

o la

te fo

r thi

s ope

ratio

n.

Syst

emSy

stem

impo

rts

invo

ice

data

.in

voic

eim

port

Wan

tsTo

Inte

rrup

tO

pera

tion

canc

eled

.O

pera

tion

canc

eled

disp

lay

No

artic

le h

as b

een

defin

ed.

NoD

ataD

efin

edN

o <o

bjec

t> h

as b

een

defin

edSy

stem

dis

play

s av

aila

ble

artic

les

artic

leSy

stem

 <v

erb

[pre

sent

per

fect

tens

e, n

egat

ive]

>A

utho

r has

not

con

firm

ed.

Aut

hor

Lack

OfC

onfir

mat

ion

conf

irmop

erat

ion

Aut

hor c

onfir

ms t

he

oper

atio

n

Doe

sNot

Exis

tde

lete

artic

leG

iven

<ob

ject

> do

es n

ot e

xist

.A

utho

r del

etes

an

artic

leA

utho

rG

iven

arti

cle

does

not

ex

ist.

Stud

ent c

hoos

es

subj

ects

.To

o lit

tle <

obje

ct [p

lura

l]> <

verb

[pas

t ten

se, p

assi

ve

voic

e]>

by <

subj

ect>

Stud

ent

TooL

ittle

Sele

cted

Too

little

subj

ects

cho

sen

by S

tude

nt.

subj

ect

choo

se

Subj

ect h

as a

lread

y be

en

sele

cted

by

Stud

ent.

<obj

ect>

has

alre

ady

been

<ve

rb [p

ast p

artic

iple

]> b

y <s

ubje

ct>

subj

ect

Stud

ent s

elec

ts a

su

bjec

t.St

uden

tse

lect

Alre

adyS

elec

ted

No

<obj

ect>

<ve

rb [p

ast t

ense

, pas

sive

voi

ce]>

by

ch

oose

Stud

ent

Stud

ent c

hoos

es a

bo

ok.

NoO

bjec

tSel

ecte

dbo

okN

o bo

ok c

hose

n by

St

uden

t.

Stud

ent s

elec

ts b

ooks

.se

lect

Stud

ent

Too

man

y <o

bjec

t> <

verb

[pas

t ten

se, p

assi

ve v

oice

]>

by <

subj

ect>

book

Too

man

y bo

oks s

elec

ted

by S

tude

nt.

TooM

uchS

elec

ted

cred

it ca

rdC

usto

mer

Con

nect

ionP

robl

emC

onne

ctio

n pr

oble

m o

ccur

red

ente

rC

usto

mer

ent

ers c

redi

t ca

rd d

etai

ls.

Con

nect

ion

prob

lem

oc

curr

ed.

artic

leA

rticl

e al

read

y ex

ists

.A

lread

yExi

st<o

bjec

t> a

lread

y ex

ists

.w

rite

inA

utho

r writ

es in

an

artic

leA

utho

r

artic

leA

utho

r writ

es in

an

artic

leA

utho

r wro

te in

in

com

plet

e da

ta o

f arti

cle.

writ

e in

 <v

erb

[ pas

t ten

se, a

ctiv

e vo

ice]

>, in

com

plet

e da

ta o

f <ob

ject

[sin

gula

r]>

Aut

hor

Inco

mpl

ete

Cus

tom

er e

nter

s nu

mbe

r of b

ooks

C

usto

mer

ente

rN

ot e

noug

h bo

oks l

eft.

book

Not

eno

ugh

<obj

ect [

plur

al]>

left

TooL

ittle

Left

Gen

erat

ed d

escr

iptio

n

Cus

tom

er e

nter

ed w

rong

da

ta o

f cre

dit c

ard.

 <v

erb

[ pas

t ten

se]>

wro

ng d

ata

of <

obje

ct>

Even

t tem

plat

e

cred

it ca

rd

Obj

ect

Verb

ente

r

Subj

ect

Cus

tom

er

Use

-cas

e st

ep

Cus

tom

er e

nter

s cre

dit

card

det

ails

.

Even

t Typ

e

Wro

ngD

ata


case-based specification [14]. The identified events in the experiments were anal-

ysed by two experts and a set of sensible events for steps from the benchmark spec-

ification was distinguished. This set was used as a reference set of events for the

evaluation of the prototype tool.

7.6.2 Sensible events

Use-case steps are written in a natural language, therefore, their interpretation can

differ from reader to reader. From the perspective of use-case events this means that

different people might identify different events for a given use-case step. Some of

the identified events, although being correct in terms of language correctness, might

not make sense in the context of the use case. For example for the step number 3

from Figure 7.1 the following event could be identified: Author answers a phone call.

This event is a correct event from language perspective, however, it does not make

sense for the given step. The steps which make sense in the context of a given use-

case step are desirable and we will call them sensible.

7.6.3 Measures

The following measures were used in order to evaluate the prototype:

• Accuracy (of events identification) - the proportion between the number of

generated sensible events descriptions and the total number of events from

the reference set of events1. It was measured using the following equation:

Accur ac y =∑n

s=1 #event s(s)∑ns=1 #Event s(s)

(7.3)

Where:

– n is the total number of steps in the benchmark specification;

– #event s(s) is the number of distinct sensible events identified by the

prototype tool for the step s;

– #Event s(s) is the number of distinct sensible events identified by all par-

ticipants in the experiments for the step s.

• Precision - the proportion of the identified events that were sensible. It mea-

sured using the following equation:

Pr eci si on = T P

T P +F P(7.4)

The symbols TP, FP are defined by the confusion matrix presented in Figure

7.6.

1This definition defers from the definition used in the domain of information retrieval. However,we wanted to be consistent with the definitions and terminology used in Section 6.4.2


• Speed - the time used for the analysis of the given number of steps. It was

measured using the following equation:

Speed(p) = #steps

T[steps/mi nute] (7.5)

Where:

– #steps is the total number of steps analysed by the prototype.

– T is the total time spend on analysing the steps by the prototype.

Expert answery n

Toolanswer

Y TP FP

N FN TN

Figure 7.6: Confusion matrix. T (true) - tool answer that is consistent with expert decision. F(false) - tool answer that is inconsistent with expert decision. P (positive) - tool positive answer(event was identified). N (negative) - tool negative answer (event was not identified).

7.6.4 Procedure

The prototype tool was run for the whole benchmark specification [14]. For every

use-case step from the specification the time of the analysis by the tool was mea-

sured separately 2. After the analysis had been finished the generated events de-

scriptions were evaluated by an expert. The similar procedure was used as with the

experiments from Section 6.4: the generated events were assigned to different ab-

stract classes of events. Based on this analysis, the identified events were divided

into sensible and insensible ones. Moreover the measures from the confusion ma-

trix (presented in Figure 7.6) were calculated.

7.6.5 Results

Table 7.7 holds the results for the prototype and for two methods used for events

identification in the experiments described in Chapter 6. H4U was a HAZOP-like

method described in Section 6.3.2. Ad hoc was a simple approach for events iden-

tification in which the participants of the experiments did not use any particular

method for events identification. Precision was not calculated for the H4U and ad

hoc approaches, hence, these results are not presented. The initial accuracy of the

prototype was equal to 0.73, however, after the analysis of the results it was found

2The analysis was performed 10 times, the average times are presented. The following hardwarewas used to perform the measurements: MacBook Air, CPU: 1.8 GHz Intel Core i7, RAM: 4GB 1333MHz DDR3, HDD: 256 GB SSD.


that the initial Observed set used in the inference engine (see Section 7.3) could be

extended by adding new elements to it. After adding four new elements to the Ob-

served set (these elements are presented in Table 7.6) the accuracy rose to 0.89.

Prototype H4UMax

H4UAvg

Ad hocMax

Ad hocAvg

Accuracy (0.73) 0.89 0.44 0.26 0.28 0.18Precision 0.93 NA NA NA NASpeed 10.8 2.32 0.85 2.65 2.58

Figure 7.7: Results for the prototype and for H4U and ad hoc approaches.

Actor (R) Activity (A) Information Object Property (P) Event type (E)SYSTEM FETCH REMOTE ConnectionProblem

USER UPDATE SET WrongDataUSER ENTER COMPOUND IncompleteUSER DELETE AT_LEAST TooLittleLeft

Table 7.6: Elements added to the Observed set based on analysis of the results of the experiment.

7.6.6 Interpretation

The presented results for the implemented prototype and two manual approaches

to events identification show that the accuracy and speed of events identification

can be highly improved by using an automated tool.

The tool was not able to generate all of the possible sensible events because

some steps did not contain direct references to the information objects (instead

words like information or data were used). Another reason was related to the events

related to the domain of the benchmark specification (university admission system).

These events could be identified only by the domain experts. For example, the step

Student links subjects to the majors can be interrupted by the Too little humanities

subjects were chosen event which is a business rule known only by a domain special-

ist.

Higher accuracy of the tool does not mean low precision. High level of preci-

sion (0.93) reached by the tool means that small number of the generated events

names were not sensible. This shows that the tool does not produce large overhead

which, in other case, might lead to additional work of rejecting unnecessary events

by Analyst.

The speed is another advantage of the tool. It was able to generate events for the

whole benchmark specification (consisting of over 150 use-case steps) in time under

20 minutes, while only five participants of the experiments were able to analyse the

whole specification in 60 minutes time.


7.6.7 Turing test

Every generated event description was analysed by an expert in order to evaluate if

the given event name made sense. However, a question arises if the generated event

descriptions are easy to understand by people. In order to evaluate the understand-

ability of the generated event descriptions an experiment, based on the assumptions

of Turing test [117], was conducted.

Experiment design

This section presents the design of the conducted experiment.

Participants The experiment was conducted with 12 students of Software Engineer-

ing programme at Poznan University of Technology. The students were familiar with

the concept of use cases and use-case events. All of the classes of the Software En-

gineering programme are conducted in English, hence it was assumed that the stu-

dents can understand text written in English.

Experimental material The material used in the experiment consisted of two parts:

form and questionnaire. The form held set of 20 pairs of descriptions of use-case

events (see Appendix A). Every pair consisted of an event description generated by

the tool and event description identified by a human in the mentioned experiments,

in case of the latter the best (from the language point of view) available description

was chosen (based on the expert evaluation). Event descriptions from every pair

represented the same abstract event. Order of the events descriptions in the pairs

was random. Table 7.8 presents an example of a pair of event descriptions used in

the experiment. The task of each subject was to decide which of the event descrip-

tions from the given pair is better from the linguistic point of view. The participants

could choose one of the following options:

• event A is much better than event B

• event A is slightly better than event B

• it is hard to say which event is better

• event B is slightly better than event A

• event B is much better than event B

The questionnaire was prepared as a measurement instrument for controlling

possible influence factors such as experience with use cases, understandability of

the events, etc.

Procedure Before the main experiment a preliminary experiment (with 3 partic-

ipants) was conducted in order to evaluate the proposed experimental material and

the time required to perform the task. Some of the comments from the participants


Use-case step:Candidate chooses a major thathe/she wants to pay for.

Event A: No major was chosen by Candidate.Event B: Candidate didn’t choose any major.

Figure 7.8: Example of pair of events used in the Turing test experiment.

of this preliminary experiment were incorporated into the final version of the mate-

rials. The results of this preliminary experiment were not taken into account in the

final results. The main experiment started with a 5-minutes introduction to the ex-

periment. The experimenter demonstrated the form and the task. The participants

were not informed that half of the event descriptions were generated by a tool. The

participants were asked to read the form and perform the task. They were also asked

to note down the start and finish times. The participants were given 30 minutes to

finish the task. After finishing the task the participants were asked to fill out the

questionnaire. At the end all the materials were collected by the experimenter. In

total the experiment was finished in 35 minutes.

Results In the experiment 5-points scale was used in order to evaluate pairs of

events. However, in order to evaluate the results a 3-points scale was used. For every

pair of events it was counted:

• how many participants marked that the event description generated by the

tools is much better or slightly better than the event identified by human;

• how many participant could not say the difference between event descrip-

tions;

• how many participants marked the event identified by human as much or

slightly better than the automatically generated event description.

The results for every pair of event descriptions are presented in Figure 7.9. Table 7.7

presents descriptive statistics for the aggregated numbers of participants choosing

one type of events (either the generated ones by the tool or the identified ones by

humans).

Average Standard deviationTool 6.3 2.85Humans 4.15 2.66

Table 7.7: Statistics of the results of the Turing test experiment.

Interpretation From the presented results it can be observed that in 12 cases

more participants considered the generated event descriptions better than those

provided by humans. In 7 cases more participants preferred the events identified

7.7. Related work 100

Figure 7.9: Results of the Turing test experiment. Event pairs are sorted according to the numberof participants preferring generated event descriptions (ascending order).

.

by human. In 1 case (event pair id = 5) the numbers were equal. On average about 6

participants voted for the event descriptions generated by the tool and only about 4

participants voted for the events identified by humans. It can be concluded that the

implemented tool was able to generate event descriptions which might be used in a

real life use-case based specification.

7.7 Related work

A number of authors, including Cox [35], Mar [83], Wiegers [120], advocate focusing

on specification of events and exceptional scenarios. Alexander [15] underlines that

“exception cases can often outnumber normal cases and can drive project costs.”.

Experiments conducted by him show that using scenarios helps in identification

of exceptions and events, however, he does not propose any particular method for

events identification. Cox et al. [35] propose the following question in the check-

list for improving requirements quality: “Alternatives and exceptions should make

sense and should be complete. Are they? “. Nevertheless, they do not present any

technique which could help in achieving completeness of events. Similarly, Wiegers

[120] suggests the following question in his inspection checklist “Are all known ex-

ception conditions documented?”.

Letier et al. [80] presented a technique for discovering unforeseen scenarios

7.8. Conclusions 101

which might have harmful consequences if they are not handled by a system. Their

method is focused on looking for implied scenarios and can help in the elabora-

tion of additional system requirements. The presented cases studies show that the

method can give good results, however, it cannot be used with use cases written in

a natural language as it assumes Message Sequence Charts are used as a scenario-

based specification language. Moreover, the prosed technique is manual and re-

quires vast domain knowledge.

Lamsweerde et al. [118] propose an approach for discovering and handling ob-

stacles in requirements. They claim that very often requirements specifications pres-

ent ideal situations and omit abnormal situations which, if not handled properly,

may lead to system failures or unexpected behaviors. Lamsweerde presents tech-

niques which work on requirements defined as users’ goals expressed in a formal

language called KAOS[79]. The proposed techniques allow for systematic identifica-

tion of obstacles from goal specifications. Due to the fact that the method is suited

for KAOS-based requirements it allows for inference about possible obstacles. How-

ever, this strict level of formality makes the requirements hard to write and even

harder to ready no IT-laymans.

Makino et al. [82] propose a method for generation of exceptional scenarios.

His approach is based on the analysis of differences between slightly different nor-

mal (positive) scenarios. This method assumes that similar normal scenarios should

share similar abnormal scenarios. Although this method can help in automatic anal-

ysis of exceptional scenarios, it works only for requirements written with a scenario

language named SLAF [52], which assumes limited vocabulary and grammar.

Sutcliffe et al. [115] propose an approach based on automatic generation of dif-

ferent scenarios from defined use cases. Those scenarios might include exceptional

situations which the system should handle. The automatic generation of exceptional

cases may look similar to the proposed approach in this paper, however, the ap-

proach proposed by Sutcliffe et al. assumes that the possible abnormal pathways

are selected manually. These pathways are used for the generation of possible sce-

narios. This shows that this method is not able to generate the possible abnormal

events in an automatic way.

7.8 Conclusions

Event completeness is one of the dimensions of the quality of software require-

ments. Ad hoc approach to events identification provides poor results (see Chap-

ter 6.4). The proposed method, called H4U, aimed at events identification was able

to improve effectiveness of manual events identification, however, the results were

7.8. Conclusions 102

still not satisfactionary. Therefore, an automated method for event identification

has been proposed in this paper. The method has been evaluated and compared

to the H4U method [65]. The following conclusions can be drawn from the results

presented in this paper:

• The proposed method of automated identification of events can achieve higher

accuracy than manual approaches. The final accuracy was at the level of 89%,

compared to the average accuracy of the H4U method of 26%. The proposed

method also achieved high precision (0.93).

• The automated method is better than the H4U method in terms of speed. On

average it can analyse around 10 steps per minute, while people using H4U

are able to analyse about 1 step per minute.

• The event names generated by the implemented tool could not be differen-

tiated from the event names proposed by humans. This was investigated in

an experiment based on the assumptions of a Turing test, where participants

(students of Software Engineering) had to compare pairs of the proposed events.

This means that the proposed method could potentially be used in industrial

settings.

Chapter 8

Conclusions

The aim of the thesis was to investigate the problem of identification of events in

use cases. The achieved results confirmed the author’s surmise that the process of

identification of use-case events can be automated and can achieve effectiveness

which is not worse than manual approaches (in fact it is even higher).

In particular, the following conclusions can be drawn from the conducted re-

search:

Conclusion#1: 9 bad smells resulting from violations of Cockburn’s guidelines can be

detected using simple NLP-based methods.

As the first step to improve the quality of use-case-based specification a set of

simple NLP-based methods was proposed. Those methods focus on detection of so

called bad smells, i.e. violations of good practices proposed by Adolph et al. [9] and

Cockburn [31]. The methods included: one specification-level method, four use-

case level methods and four step-level methods.

The methods were evaluated using 40 use cases. Different kinds of bad smells

could be found in those use-cases. The preliminary case study showed that those

bad smells could be detected by the proposed methods. Examples of how those

methods work are presented in Section 4.5.

Conclusion#2: A HAZOP-based method of manual identification of events in use

cases is more effective than an ad hoc approach.

Events in use cases can be treated as deviations in real-time systems. Based

on this assumption, the HAZOP-based method (called H4U) was proposed and de-

scribed in Chapter 6. H4U proposes new primary keywords based on analysis of

real-life use cases, while it preserves the original secondary keywords from the HA-

ZOP method.

The proposed method was evaluated in two experiments, the first one with 18

103

Conclusions 104

students, the second one with 64 IT professionals. As presented in Section 6.4, H4U

performed better than the ad hoc approach in terms of accuracy of events identi-

fication. Students using the H4U method were able to identify 26%, and using the

ad hoc approach 18% of the sensible events. In the case of IT professionals, they

were able to identify 17% of sensible events with the use of the H4U method and

14% with the ad hoc approach. The differences between H4U and ad hoc were sta-

tistically significant and showed that H4U helps in achieving higher effectiveness of

events identification than ad hoc approach.

Moreover, these experiments showed that event identification is not an easy task

and it is not easy to achieve a high level of completeness of events in use cases.

Conclusion#3: Manual identification of events in use cases is not highly effective.

The experiments described in Chapter 6 showed that the maximum accuracy of

events identification was equal to 0.45 and the maximal average accuracy was equal

to 0.26. These results reveal the problem of the low effeciveness of manual identi-

fication of events. Even using a specialised method, based on HAZOP, participants

of the experiment were not able to achieve more than 0.5 accuracy. This presents a

vast space for new solutions which could increase the effeciveness of events identi-

fication in use cases.

Conclusion#4: The automated method of events identification described in Section

7.3 is more effective than the examined manual approaches (ad hoc and H4U).

A method for automatic events identification was proposed in Chapter 7. The

method consists of four phases: preprocessing of information about actors and in-

formation objects; analysing a step using NLP techniques; inferring about the possi-

ble events using the proposed inference rules; generating event descriptions in nat-

ural language using NLG techniques.

In order to evaluate the proposed method, a prototype tool was implemented.

The tool was based on UC Workbench [88] and it used Stanford Parser [37] and

OpenNLP [1] tools for NLP analysis and SimpleNLG [46] for the task of natural lan-

guage generation.

The method was evaluated in two experiments. In the first experiment, the accu-

racy and speed of the automated method was compared to the H4U and ad hoc ap-

proach. The automated method was able to achieve accuracy equal to 0.89 while the

maximum average accuracy for H4U was equal to 0.26 and for ad hoc it was equal to

0.18. Another important measure is precision of identification of events defined in

Section 7.6. For the benchmark requirements specification the achieved precision

was 0.93. As expected, the prototype tool, although not optimised for speed, was

Conclusions 105

considerably faster than the manual approaches. The tool was able to analyse over

10 steps in a minute while maximum average speed for H4U was equal to 0.85 and

for ad hoc was equal to 2.58.

In the second experiment, the understandability of the generated event descrip-

tions was evaluated. That experiment was a kind of Turing test [117]. Twelve stu-

dents were asked to compare pairs of events: one proposed by a human and one

generated by the prototype tool. The task of the participants was to decide which of

the two event descriptions was better from the language point of view. The experi-

ment showed that event descriptions generated by the tool are not worse than those

written by humans.

Future work In this thesis an automated method of events identification has been

proposed (a generator of event descriptions). A question arises whether it is possible

to further develop the method to perform an automatic review of events identified

by a human (a checker of completeness of identified events). The main and most

difficult problem is automatic discovery of equivalences between two sentences ex-

pressed in natural language.

From the practical point of view, it also would be important to extend the pro-

totype tool to allow analysis of use cases written in other natural languages (e.g.

Polish).

Appendix A

Material used in Turing test

experiment

106

Participant ID: ........................

Start time: ............................ ! ! ! !

The task:Evaluate (from the language point of view) the given events for the use-case steps. Mark every pair of events with one of the following options:

A >> B - event A is much better than event BA > B - event A is slightly better than event BA ? B - it’s hard to say which event is betterB > A - event B is slightly better than event AB >> A - event B is much better than event B

1.

Step: 2. System presents a list of majors for which admission is available.

Event A: 2.A. No major is defined in the System.

Event B: 2.A. There are no majors for which admission is available

Evaluation: A >> B A > B A ? B B > A B >> A

2.

Step: 3. Candidate enters text of the post.

Event A: 3.A. Incomplete data of post was entered by Candidate.

Event B: 3.A. Candidate did not fill the text of post.


3.

Step: 3. Administrator provides basic information concerning the admission.

Event A: 3.A. Administrator provides not enough data of admission.

Event B: 3.A. Incomplete data of admission was provided by Administrator.


4.

Step: 3. System displays the content of the post.

Event A: 3.A. No post is defined in the System.

Event B: 3.A. No post available.


5.

Step: 5. Administrator links languages available on the major.

Event A: 5.A. No languages chosen by Administrator.

Event B: 5.A. Language was not selected.


6.

Step: 4. Candidate fills registration data and submits the form.

Event A: 4.A. Candidate provided wrong data.

Event B: 4.A. Wrong data of registration data was filled by Candidate.


7.

Step: 2. System imports payment entries from the bank.

Event A: 2.A. The bank system is not available

Event B: 2.A. Connection problem occurred.


8.

Step: 1. Selecting committee selects a Candidate who was qualified and selects an option to verify his data.

Event A: 1.A. No qualified candidate is defined in the System.

Event B: 1.A. No candidates are qualified.


9.

Step: 3. Candidate chooses a major that he/she wants to pay for.

Event A: 3.A. No major was chosen by Candidate

Event B: 3.A. Candidate didn’t choose any major


10.

Step: 6. Candidate provides credit card data and confirms payment.

Event A: 6.A. Candidate does not have founds on account.

Event B: 6.A. Not enough money is left on credit card.


11.

Step: 7. Selection committee confirms payment cancellation.

Event A: 7.A. Lack of confirmation from Selection committee.

Event B: 7.A. Cancelation is not confirmed by committee.


12.

Step: 3. Candidate provides question.

Event A: 3.A. Incomplete data of question was provided by Candidate

Event B: 3.A. Empty string was provided by candidate


13.


Event A: 6.A. Credit card lost its validity.

Event B: 6.A. Credit card is expired.


14.

Step: 3. Administrator confirms removing.

Event A: 3.A. Administrator does not confirm removal.

Event B: 3.A. Lack of confirmation from Administrator.


15.

Step: 5. System presents the list of current issues.

Event A: 5.A. No issue is defined in the System.

Event B: 5.A. The list of current issues is empty.


16.

Step: 6. Selecting committee analyses application as an issue.

Event A: 6.A. Too late for performing analyse operation.

Event B: 6.A. Selecting committee provides the analysis later than it was defined in the process.


17.


Event A: 6.A. Provided data is incorrect.

Event B: 6.A. Wrong data of credit card was provided by Candidate.


18.

Step: 3. Administrator provides information and confirms creation of the subject

Event A: 3.A. Administrator duplicated existing subject.

Event B: 3.A. Subjects already exist in the System.


19.

Step: 3. Administrator defines an algorithm in the algorithm editor.

Event A: 3.A. Algorithm already exists in the System.

Event B: 3.A. The same algorithm exists


20.

Step: 2. Selecting committee chooses one of the admissions from the available in the system.

Event A: 2.A. No admission is defined in the System.

Event B: 2.A. There are no admissions available in the system.


Finish time: .......................

Bibliography

[1] OpenNLP homepage http://opennlp.apache.org, last checked: 4th June 2013.

[2] Pablo Picasso’s quotehttp://www.brainyquote.com/quotes/quotes/p/pablopicas102018.html, lastchecked: 1st August 2013.

[3] Stanford Log-linear Part-Of-Speech Taggerhttp://nlp.stanford.edu/software/tagger.shtml, last checked: 4th June 2013.

[4] Stanford Tokenizer http://nlp.stanford.edu/software/tokenizer.shtml, lastchecked: 4th June 2013.

[5] The Stanford Parser: A statistical parserhttp://nlp.stanford.edu/software/lex-parser.shtml, last checked: 4th June2013.

[6] Use Cases Database (UCDB). http://www.ucdb.cs.put.poznan.pl, last checked: 31stAugust 2013.

[7] Chaos: A recipe for success, 2004. Technical report, Standish Group, 2004.

[8] John Aberdeen, John Burger, David Day, Lynette Hirschman, Patricia Robinson, and MarcVilain. Mitre: description of the alembic system used for muc-6. In Proceedings of the 6thconference on Message understanding, MUC6 ’95, pages 141–155, Stroudsburg, PA, USA, 1995.Association for Computational Linguistics.

[9] S. Adolph, P. Bramble, A. Cockburn, and A. Pols. Patterns for Effective Use Cases.Addison-Wesley, 2002.

[10] Advances in Neural Information Processing Systems 15. Fast Exact Inference with a FactoredModel for Natural Language Parsing, 2003.

[11] A.J. Albrecht. Measuring application development productivity. Proceedings of the JointSHARE/GUIDE/IBM Application Development Symposium, pages 83–92, October 1979.

[12] A.J. Albrecht. AD/M Productivity Measurement and Estimate Validation. IBM CorporateInformation Systems and Administration, 1984.

[13] B. Alchimowicz, J. Jurkiewicz, M. Ochodek, and J. Nawrocki. Building benchmarks for usecases. Computing and Informatics, 29(1):27–44, 2010.

[14] Bartosz Alchimowicz, Jakub Jurkiewicz, Jerzy Nawrocki, and Mirosław Ochodek. Towardsuse-cases benchmark. In Software Engineering Techniques, volume 4980 of Lecture Notest inComputer Science, pages 20–33. Springer Verlag, 2011.

[15] I. Alexander. Scenario-driven search finds more exceptions. In Proceedings of the 11thInternational Workshop on Database and Expert Systems Applications, DEXA ’00, pages 991–,Washington, DC, USA, 2000. IEEE Computer Society.

[16] I. Alexander. Misuse cases: Use cases with hostile intent. IEEE Software, 20(1):58–66, 2003.

112

http://opennlp.apache.org

http://www.brainyquote.com/quotes/quotes/p/pablopicas102018.html

http://nlp.stanford.edu/software/tagger.shtml

http://nlp.stanford.edu/software/tokenizer.shtml

http://nlp.stanford.edu/software/lex-parser.shtml


113

[17] Karen Allenby and Tim Kelly. Deriving safety requirements using scenarios. In in the 5th IEEEInternational Symposium on Requirements Engineering(RE’01), pages 228–235. Society Press,2001.

[18] Bente Anda and Dag I. K. Sj. Towards an inspection technique for use case models. InProceedings of the 14th international conference on Software engineering and knowledgeengineering, pages 127–134, 2002.

[19] David Baccarini, Geoff Salm, and Peter E. D. Love. Management of risks in informationtechnology projects. Industrial Management and Data Systems, 104(4):286–295, 2004.

[20] Adam L. Berger, Vincent J. Della Pietra, and Stephen A. Della Pietra. A maximum entropyapproach to natural language processing. Comput. Linguist., 22(1):39–71, March 1996.

[21] B. Bernardez, A. Duran, and M. Genero. Empirical Evaluation and Review of a Metrics-BasedApproach for Use Case Verification. Journal of Research and Practice in InformationTechnology, 36(4):247–258, 2004.

[22] Stefan Biffl and Michael Halling. Investigating the defect detection effectiveness and costbenefit of nominal inspection teams. IEEE Transactions on Software Engineering.,29(5):385–397, May 2003.

[23] D. Bjørner and CB. Jones, editors. The Vienna Development Method: The Meta-Language,volume 61 of Lecture Notes in Computer Science. Springer–Verlag, 1978.

[24] Leo Breiman, Jerome Friedman, Charles J. Stone, and R. A. Olshen. Classification andRegression Trees. Chapman and Hall/CRC, 1 edition, January 1984.

[25] John M. Carroll. Making Use: Scenario-Based Design of Human-Computer Interactions. MITPress, Cambridge, MA, USA, 2000.

[26] John M. Carroll, Mary Beth Rosson, George Chin, Jr., and Jürgen Koenemann. Requirementsdevelopment in scenario-based design. IEEE Trans. Softw. Eng., 24(12):1156–1170, December1998.

[27] Ronald S. Carson. A set theory model for anomaly handling in system requirements analysis.In Proceedings of NCOSE, 1995.

[28] Ronald S. Carson. Requirements completeness: A deterministic approach. In Proceedings of 8thAnnual International INCOSE Symposium, 1998.

[29] Chemical Industries Association. A Guide to Hazard and Operability Studies, 1977.

[30] Alicja Ciemniewska, Jakub Jurkiewicz, Jerzy Nawrocki, and Łukasz Olek. Supporting use-casereviews. In 10th International Conference on Business Information Systems, volume 4439 ofLNCS, pages 424–437. Springer Verlag, 2007.

[31] A. Cockburn. Writing Effective Use Cases. Addison-Wesley Boston, 2001.

[32] J. Cohen. Statistical power analysis. Current Directions in Psychological Science, 1(3):98–101,1992.

[33] Mike Cohn. User Stories Applied: For Agile Software Development. Addison Wesley LongmanPublishing Co., Inc., Redwood City, CA, USA, 2004.

[34] Larry L. Constantine and Lucy A. D. Lockwood. Software for use: a practical guide to the modelsand methods of usage-centered design. ACM Press/Addison-Wesley Publishing Co., New York,NY, USA, 1999.

[35] Karl Cox, Aybüke Aurum, and D. Ross Jeffery. An experiment in inspecting the quality of usecase descriptions. Journal of Research and Practice in Information Technology, 36(4), 2004.

[36] Marie-Catherine de Marnee and Christopher D. Manning. Stanford typed dependenciesmanual.

114

[37] Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning. Generating typeddependency parses from phrase structure parses. In In LREC 2006, 2006.

[38] Marie-Catherine de Marneffe and Ch. D. Manning. The Stanford typed dependenciesrepresentation. In Proceedings of the Coling 2008 Workshop on Cross-Framework andCross-Domain Parser Evaluation, pages 1–8, 2008.

[39] C. Denger, B. Paech, and B. Freimut. Achieving high quality of use-case-based requirements.Informatik-Forschung und Entwicklung, 20(1):11–23, 2005.

[40] Steven J. DeRose. Grammatical category disambiguation by statistical optimization. Comput.Linguist., 14(1):31–39, January 1988.

[41] S. Diev. Use cases modeling and software estimation: Applying Use Case Points. ACM SIGSOFTSoftware Engineering Notes, 31(6):1–4, 2006.

[42] Chuan Duan, Paula Laurent, Jane Cleland-Huang, and Charles Kwiatkowski. Towardsautomated requirements prioritization and triage. Requir. Eng., 14(2):73–89, 2009.

[43] T. Fawcett. Roc graphs: Notes and practical considerations for data mining researchers. 2003.

[44] Martin Fowler, Kent Beck, John Brant, William Opdyke, and Don Roberts. Refactoring:Improving the Design of Existing Code. Addison-Wesley, 1999.

[45] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design patterns: elements ofreusable object-oriented software. Addison-Wesley Longman Publishing Co., Inc., Boston, MA,USA, 1995.

[46] Albert Gatt and Ehud Reiter. Simplenlg: a realisation engine for practical applications. InProceedings of the 12th European Workshop on Natural Language Generation, ENLG ’09, pages90–93, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics.

[47] Martin Glinz. Improving the quality of requirements with scenarios, 2000.

[48] Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning.Multiword expression identification with tree substitution grammars: a parsing tour de forcewith french. In Proceedings of the Conference on Empirical Methods in Natural LanguageProcessing, EMNLP ’11, pages 725–735, Stroudsburg, PA, USA, 2011. Association forComputational Linguistics.

[49] Constance L. Heitmeyer, Ralph D. Jeffords, and Bruce G. Labaw. Automated consistencychecking of requirements specifications. ACM Trans. Softw. Eng. Methodol., 5(3):231–261, 1996.

[50] Hewlett-Packard. Applications of Software Measurement Conference, 1999.

[51] M. Höst, B. Regnell, and C. Wohlin. Using students as subjects—a comparative study ofstudents and professionals in lead-time impact assessment. Empirical Software Engineering,5(3):201–214, 2000.

[52] Zhang Hong Hui and Atsushi Ohnishi. Integration and evolution method of scenarios fromdifferent viewpoints. In Proceedings of the 6th International Workshop on Principles of SoftwareEvolution, IWPSE ’03, pages 183–, Washington, DC, USA, 2003. IEEE Computer Society.

[53] R. Hurlbut. A Survey of Approaches for Describing and Formalizing Use Cases. Expertech, Ltd,1997.

[54] IEEE. IEEE Standard for Software Reviews (IEEE Std 1028-1997), 1997.

[55] IEEE. IEEE Recommended practice for software requirements specifications (IEEE Std830–1998), 1998.

[56] ISO. ISO/IEC 25010: 2011, Systems and software engineering–Systems and software QualityRequirements and Evaluation (SQuaRE)-System and software quality models, 2011.

[57] ISO/IEC. ISO/IEC 9126: Software engineering — Product quality — Parts 1–4, 2001–2004.

115

[58] Jonathan Jacky. The Way of Z: Practical Programming with Formal Methods. CambridgeUniversity Press, November 1996.

[59] I. Jacobson. Concepts for modeling large real time systems. Royal Institute of Technology, Dept.of Telecommunication Systems-Computer Systems, 1985.

[60] I. Jacobson. Object-oriented development in an industrial environment. ACM SIGPLANNotices, 22(12):183–191, 1987.

[61] I. Jacobson, M. Christerson, P. Jonsson, and G. Övergaard. Object-Oriented SoftwareEngineering: A Use Case Driven Approach. Addison Wesley Longman, Inc, 1992.

[62] Ivar Jacobson. Use cases - yesterday, today, and tomorrow. Technical report, Rational Software,2002.

[63] Aleksander Jarzebowicz. A method for anomaly detection in selected models of informationsystems. PhD thesis, Gdansk University of Technology, 2007.

[64] Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction toNatural Language Processing, Computational Linguistics, and Speech Recognition. Prentice HallPTR, Upper Saddle River, NJ, USA, 1st edition, 2000.

[65] J. Jurkiewicz, J. Nawrocki, M. Ochodek, and T. Glowacki. HAZOP-based identification of eventsin use cases. an empirical study. Empirical Software Engineering, Accepted Manuscript, 2013.

[66] J. Jurkiewicz, M. Ochodek, and B. Walter. Specyfikacja wymagan dla systemów informatycznych(Guidelines for Developing SRS). © Copyrights by Poznan City Hall, 2009.http://www.se.cs.put.poznan.pl/projects/consulting/um-srs, last checked:30th March 2011.

[67] Mayumi Itakura Kamata and Tetsuo Tamai. How does requirements quality relate to projectsuccess or failure? In Requirements Engineering Conference, 2007. RE ’07. 15th IEEEInternational, pages 69–78, 2007.

[68] Leon A. Kappelman, Robert McKeeman, and Lixuan Zhang. Early warning signs of it projectfailure: The dominant dozen. EDPACS, 35(1):1–10, January 2007.

[69] G. Karner. Metrics for objectory. No. LiTH-IDA-Ex-9344:21. Master’s thesis, University ofLinköping, Sweden, 1993.

[70] R. Kazman, M. Klein, and P. Clements. ATAM: Method for Architecture Evaluation. Technicalreport. Carnegie Mellon University, Software Engineering Institute, 2000.

[71] Mark Keil, Amrit Tiwana, and Ashley A. Bush. Reconciling user and project managerperceptions of it project risk: a delphi study. Inf. Syst. J., 12(2):103–120, 2002.

[72] Dan Klein and Christopher D. Manning. Accurate unlexicalized parsing. In In proceedings ofthe 41st Annual Meeting of the association for Computational Linguistics, pages 423–430, 2003.

[73] John C. Knight and E. Ann Myers. An improved inspection technique. Commun. ACM,36(11):51–61, 1993.

[74] Per Kroll and Philippe Kruchten. The rational unified process made easy: a practitioner’s guideto the RUP. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2003.

[75] Philippe Kruchten. The Rational Unified Process: An Introduction (3rd Edition).Addison-Wesley Professional, 3 edition, 2003.

[76] H.G. Lawley. Operability studies and hazard analysis. Chemical Engineering Progress,, 70(4):45,1974.

[77] Beum-Seuk Lee and Barrett R. Bryant. Automated conversion from requirementsdocumentation to an object-oriented formal specification language. In Proceedings of the 2002ACM symposium on Applied computing, SAC ’02, pages 932–936, New York, NY, USA, 2002.ACM.

http://www.se.cs.put.poznan.pl/projects/consulting/um-srs

116

[78] Dean Leffingwell. Calculating the return on investment from more effective requirementsmanagement. American Programmer, 10:13–16, 1997.

[79] Emmanuel Letier. Reasoning about agents in goal-oriented requirements engineering, 2001.

[80] Emmanuel Letier, Jeff Kramer, Jeff Magee, and Sebastián Uchitel. Monitoring and control inscenario-based requirements analysis. In ICSE, pages 382–391, 2005.

[81] Mich Luisa, Franch Mariangela, and Inverardi Pierluigi. Market research for requirementsanalysis using linguistic tools. Requir. Eng., 9(1):40–56, February 2004.

[82] Masayuki Makino and Atsushi Ohnishi. A method of scenario generation with differentialscenario. In RE, pages 337–338, 2008.

[83] Brian W. Mar. Requirements for development of software requirements. In Proceedings ofNCOSE, 1994.

[84] J. Martin. Managing the data-base environment. The James Martin books on computer systemsand telecommunications. Prentice-Hall, 1983.

[85] Pete McBreen. Creating acceptance tests from use cases.

[86] Kjetil Molkken and Magne Jrgensen. A review of surveys on software effort estimation. InProceedings of the 2003 International Symposium on Empirical Software Engineering, ISESE ’03,pages 223–, Washington, DC, USA, 2003. IEEE Computer Society.

[87] J. Nawrocki and Ł. Olek. UC Workbench - A tool for writing use cases. In 6th InternationalConference on Extreme Programming and Agile Processes, volume 3556 of Lecture Notes inComputer Science, pages 230–234. Springer-Verlag, June 2005.

[88] Jerzy Nawrocki and Łukasz Olek. UC Workbench – a tool for writing use cases and generatingmockups. In Extreme Programming and Agile Processes in Software Engineering, volume 3556of LNCS, pages 230–234. Springer, 2005.

[89] Jerzy Nawrocki and Łukasz Olek. Use-cases engineering with UC Workbench. In KrzysztofZieliÒski and Tomasz Szmuc, editors, Software Engineering: Evolution and EmergingTechnologies, volume 130, pages 319–329. IOS Press, oct 2005.

[90] Jerzy Nawrocki, Łukasz Olek, Michal Jasinski, Bartosz Paliswiat, Bartosz Walter, Baej Pietrzak,and Piotr Godek. Balancing agility and discipline with xprince. In Rapid Integration of SoftwareEngineering Techniques, volume 3943 of Lecture Notes in Computer Science, pages 266–277.Springer Berlin / Heidelberg, 2006.

[91] Clémentine Nebut, Franck Fleurey, Yves Le Traon, and Jean-Marc Jézéquel. Automatic testgeneration: A use case driven approach. IEEE Trans. Software Eng., 32(3):140–155, 2006.

[92] C.J. Neill and P.A. Laplante. Requirements engineering: The state of the practice. Software,IEEE, 20(6):40–45, 2003.

[93] M. Niazi and S. Shastry. Role of requirements engineering in software developement process:An empirical study. In Multi Topic Conference, 2003. INMIC 2003. 7th International, pages 402–407, 2003.

[94] Mirosław Ochodek. Empirical Examination of Use Case Points. PhD thesis, Poznan Universityof Technology, Poznan, Poland, 2011.

[95] Łukasz Olek. Ra-03/08: Generating acceptance tests from use cases. Technical ReportRA-03/08, Poznan University of Technology, 2008.

[96] Łukasz Olek, Jerzy Nawrocki, Bartosz Michalik, and Mirosław Ochodek. Quick prototyping ofweb applications. In Lech Madeyski, Mirosław Ochodek, Dawid Weiss, and Jaroslav Zendulka,editors, Software Engineering in Progress, pages 124–137. NAKOM, 2007.

[97] OMG. Business object DTF: Common business objects, OMG document bom/97-11-11, version1.5, February 1997.

117

[98] OMG. OMG Unified Modeling Language™(OMG UML), superstructure, version 2.3, May 2010.

[99] Roger Pressman. Software Engineering - A practitioners Approach. McGraw-Hill, 2001.

[100] David J. Pumfrey. The Principled Design of Computer System Safety Analyses, 1999.

[101] Anna N. Rafferty and Christopher D. Manning. Parsing three german treebanks: lexicalized andunlexicalized baselines. In Proceedings of the Workshop on Parsing German, PaGe ’08, pages40–46, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics.

[102] F. Redmill, M. Chudleigh, and J. Catmur. System safety: HAZOP and software HAZOP. Wiley,1999.

[103] D. Rosenberg and M. Stephens. Use Case Driven Object Modeling with UML: Theory andPractice. Apress, Berkely, CA, USA, 2007.

[104] Serguei Roubtsov and Petra Heck. Use case-based acceptance testing of a large industrialsystem: Approach and experience report. In Proceedings of the Testing: Academic & IndustrialConference on Practice And Research Techniques, TAIC-PART ’06, pages 211–220, Washington,DC, USA, 2006. IEEE Computer Society.

[105] Chris Sauer and Christine Cuthbertson. The State of IT Project Management in the UK 2002 -2003. Technical report, Templeton College, University of Oxford, 2003.

[106] Roy Schmidt, Kalle Lyytinen, Mark Keil, and Paul Cule. Identifying software project risks: Aninternational delphi study. J. Manage. Inf. Syst., 17(4):5–36, March 2001.

[107] Ken Schwaber. Scrum development process. In Proceedings of the 10th Annual ACM Conferenceon Object Oriented Programming Systems, Languages, and Applications (OOPSLA, pages117–134, 1995.

[108] F. J. Shull, J. C. Carver, S. Vegas, and N. Juristo. The role of replications in empirical softwareengineering. Empirical Software Engineering, 13(2):211–218, 2008.

[109] M. Smiałek. Software Development with Reusable Requirements-based Cases. Prace Naukowe -Politechnika Warszawska: Elektryka. Oficyna Wydawnicza Politechniki Warszawskiej, 2007.

[110] S.S. Somé. Supporting use case based requirements engineering. Information and SoftwareTechnology, 48(1):43–58, 2006.

[111] Thitima Srivatanakul, John A. Clark, and Fiona Polack. Effective security requirements analysis:Hazop and use cases. In In Information Security: 7th International Conference, vol. 3225 ofLecture Notest in Computer Science, pages 416–427. Springer, 2004.

[112] Tor Stålhane and Guttorm Sindre. A comparison of two approaches to safety analysis based onuse cases. In Conceptual Modeling - ER 2007, volume 4801 of Lecture Notes in ComputerScience, pages 423–437. Springer Berlin / Heidelberg, 2007.

[113] Tor Stålhane, Guttorm Sindre, and Lydie du Bousquet. Comparing safety analysis based onsequence diagrams and textual use cases. In Advanced Information Systems Engineering,volume 6051 of Lecture Notes in Computer Science, pages 165–179. Springer, 2010.

[114] George E. Stark, Paul W. Oman, Alan Skillicorn, and Alan Ameele. An examination of the effectsof requirements changes on software maintenance releases. Journal of Software Maintenance,11(5):293–309, 1999.

[115] Alistair G. Sutcliffe, Neil A.M. Maiden, Shailey Minocha, and Darrel Manuel. Supportingscenario-based requirements engineering. IEEE Transactions on Software Engineering, 24(12),December 1998.

[116] Hazel Taylor. Critical risks in outsourced it projects: the intractable and the unforeseen.Commun. ACM, 49(11):74–79, November 2006.

[117] Alan M. Turing. Computing Machinery and Intelligence. Mind, LIX:433–460, 1950.

118

[118] Axel van Lamsweerde and Emmanuel Letier. Handling obstacles in goal-oriented requirementsengineering. IEEE Trans. Softw. Eng., 26(10):978–1005, October 2000.

[119] Brenda Whittaker. What went wrong? unsuccessful information technology projects. Inf.Manag. Comput. Security, 7(1):23–30, 1999.

[120] Karl E. Wiegers. Inspection checklist for use case documents. In Proceedings of 9th IEEEInternational Software Metrics Symposium. IEEE, September 2003.

[121] K.E. Wiegers. Software Requirements. Microsoft Press, Redmond, WA, USA, 2nd edition, 2003.

[122] Willie and Schlechter. Process risk assessment - Using science to ’do it right’. InternationalJournal of Pressure Vessels and Piping, 61(2-3):479–494, 1995.

[123] J. Wood and D. Silver. Joint application development. Wiley, 1995.

[124] K. T. Yeo. Critical failure factors in information system projects. International Journal of ProjectManagement, 20(3):241–246, April 2002.

[125] Eric S. K. Yu. Towards modeling and reasoning support for early-phase requirementsengineering. In Proceedings of the 3rd IEEE International Symposium on RequirementsEngineering, RE ’97, pages 226–, Washington, DC, USA, 1997. IEEE Computer Society.

[126] Hong Zhu and Lingzi Jin. Scenario analysis in an automated tool for requirements engineering.Requir. Eng., 5(1):2–22, 2000.

[127] B.D. Zumbo and A.M. Hubley. A note on misconceptions concerning prospective andretrospective power. Journal of the Royal Statistical Society – Series D: The Statistician,47(2):385–388, 1998.

© 2013 Jakub Jurkiewicz

Institute of Computing SciencePoznan University of Technology, Poznan, Poland

Typeset using LATEX in Adobe Utopia.Compiled on November 7, 2013

BibTEX entry:

@phdthesis{jjurkiewicz2013phd,author = "Jakub Jurkiewicz",title = "{Identification of Events in Use Cases}",school = "Pozna{\’n} University of Technology",address = "Pozna{\’n}, Poland",year = "2013",

}

Empirical Examination of Use Case Points - Wydzia‚ Informatyki

Documents

Transcript of Empirical Examination of Use Case Points - Wydzia‚ Informatyki