Empirical Examination of Use Case Points - Wydzia‚ Informatyki
Transcript of Empirical Examination of Use Case Points - Wydzia‚ Informatyki
Poznan University of Technology
Faculty of Computing
Identification of Events in Use Cases
Jakub Jurkiewicz
A dissertation
submitted in partial fulfillment of the
requirements for the degree of
Doctor of Philosophy.
Supervisor: Jerzy Nawrocki, PhD, Dr Habil.Co-supervisor: Mirosław Ochodek, PhD
Poznan, Poland, 2013
A B S T R A C T
Context: Requirements are one of the most important elements of software development.They involve many stakeholders, including IT professionals (Analysts, Project Manager, Ar-chitects, etc.) and IT laymen (e.g. Customers, End Users, Domain Experts, Investors). Inorder to build a bridge between those two groups, requirements are often written in a nat-ural language, e.g. in the form of use cases. They describe interaction between an end-user(called “actor”) and the system being built. The most important element of a use case is amain scenario leading to obtaining actor’s goal in the most “typical" way. A main scenariois a sequence of steps taken by the actor or the system. Each step describes a simple ac-tion in a natural language (e.g. “Student enters his account and password."). Deviationsfrom the main scenario are also documented within a use case. They are usually called ex-tensions and are described as alternative interactions. Each extension is a response to agiven event causing the deviation (e.g. “Account or password is wrong."). The first problemis effectiveness of event identification, i.e. degree of completeness of identification of theevents causing deviations from the main scenario (in an ideal case all the events should beidentified). Another problem is efficiency, i.e. time necessary to identify events. Identifyingall such events is very important, as unidentified events could negatively impact not onlycoding but also effort estimation, user interface design, user manual writing, architectureevaluation, etc. In general, three approaches to event identification are possible: 1) an adhoc approach used by humans, 2) a specialised method of event identification to be used byhumans, 3) an automatic method to be executed by a computer.
Objective: The aim of the thesis is to propose an automatic method of event identifica-tion that would not be worse than identification made by humans, in terms of effectiveness(event completeness) and efficiency (time).
Method: To develop an automatic method of event identification, a pool of almost 200 usecases was divided equally into a learning part and a testing part. Through analysis of thelearning part, 14 abstract event types and two inference rules were identified. Using NLPtools (Stanford Parser, OpenNLP) and a use case editor (UC Workbench), a pilot version of atool for automatic event identification in use cases was designed and implemented. The toolwas used to evaluate both the effectiveness (event completeness) and efficiency (time) of theproposed method against results obtained from human-based experiments performed inacademia (18 students) and industry (64 practitioners). 9 students and 32 practitioners usedthe ad hoc approach, similarly 9 students and 32 practitioners used a HAZOP-based methodcalled H4U. Both the human-based experiment and the evaluation of the proposed methodwere based on a use-case benchmark developed at the Poznan University of Technology,Faculty of Computing.
Results: From the thesis it follows that:
• the average effectiveness of event identification performed by humans using the adhoc approach is at the level of 0.18;
• the average effectiveness of event identification performed by humans using theHAZOP-based approach is at the level of 0.26 and the difference between those twoapproaches is statistically significant;
• an automatic tool for event identification can be created and its average effective-ness is at the level of 0.8 (the automatic method outperformed humans and this isstatistically significant);
• the speed of analysis of use cases, for the purpose of event identification, with thead hoc approach is at the level of 2.5 steps per minute and with the HAZOP-basedmethod is at the level of 0.5 steps per minute;
• the speed of analysis of use cases by the automated tool, for the purpose of eventidentification, is at the level of 10.8 steps per minute;
• using NLP tools one can automatically identify 9 bad smells in use cases which arebased on the good practices proposed Adolph et al. in their book Patterns for EffectiveUse Cases.
Conclusions: It is possible to build a commercial tool aimed at automatic event identifica-tion in use cases, that would be a part of an intelligent use-case editor. Such a tool wouldsupport breath-first approach to elicitation of use cases and thus would fit the agile ap-proach to software development.
Politechnika Poznanska
Wydział Informatyki
Identyfikacja zdarzen w
przypadkach uzycia
Jakub Jurkiewicz
Rozprawa doktorska
Promotor: dr hab. inz. Jerzy NawrockiPromotor pomocniczy: dr inz. Mirosław Ochodek
Poznan, 2013
S T R E S Z C Z E N I E
Kontekst: Wymagania sa jednym z najwazniejsych elementów wytwarzania oprogramowa-nia. Ich zbieranie angazuje wielu interesariuszy, właczajac w to profesjonalistów IT (anality-ków, menadzerów projektu, architektów, itp.), a takze osoby spoza branzy IT (n.p. klientów,uzytkowników koncowych, ekspertów z wybranej dziedziny). Aby stworzyc pomost miedzytymi dwoma grupami, wymagania czesto zapisywane sa w jezyku naturalnym, np. za po-moca przypadków uzycia. Przypadki uzycia opisuja interakcje miedzy uzytkownikiem kon-cowym (nazywanym “aktorem”) oraz budowanym systemem. Najwazniejszym elementemprzypadku uzycia jest scenariusz główny opisujacy typowy przebieg interkacji prowadzacydo osiagniecia przez aktora konkretnego celu. Scenariusz główny to sekwencja kroków wy-konywanych przez aktora i system. Kazdy krok opisuje prosta akcje w jezyku naturalnym(np. “Student podaje dane personalne”). Odchylenia od głównego scenariusza sa równiezzapisywane w przypadku uzycia. Nazywane sa zwykle rozszerzeniami i sa opisywane za po-moca scenariuszy alternatywnych. Kazde rozszerzenie jest odpowiedzia na zdarzenie powo-dujace odchylenie. Pierwszym problemem jest efektywnosc identyfikacji zdarzen, czyli jakzidentyfikowac wszystkie zdarzenia powodujace odchylenia od scenariusza głownego? In-nym problemem jest wydajnosc, czyli czas potrzebny do identyfikacji zdarzen. Identyfikacjazdarzen jest bardzo istotna, gdyz niezidentyfikowane zdarzenia moga negatywnie wpłynacnie tylko na etap tworzenia kodu systemu, lecz równiez na szacowanie pracochłonnosci,projektowanie interfejsu uzytkownika, tworzenie dokumentacji uzytkownika, ocene archi-tektury systemu, itp. W ogólnosci mozliwe sa trzy sposoby na identyfikacje zdarzen: 1)podejscie ad hoc uzywane do przegladów recznych; 2) specjalistyczna metoda identyfika-cji zdarzen uzywana do przegladów recznych; 3) automatyczna metoda wykonywana przezkomputer.
Cel: Celem tej pracy jest zaproponowanie automatycznej metody identyfikacji zdarzen,która identyfikowałaby zdarzenia niegorzej (w sensie efektywnosci i wydajnosci) niz metodyprzegladów recznych.
Metoda: Aby opracowac metode automatycznej identyfikacji zdarzen, zbiór prawie 200przypadków uzycia został podzielony równo na czesc uczaca oraz czesc testujaca. Analizaczesci uczacej pozwoliła na wyodrebnienie 14 abstrakcyjnych typów zdarzen oraz dwóch re-guł wnioskowania. Z wykorzystaniem narzedzi do przetwarzania jezyka naturalnego (Stan-ford Parser, OpenNLP) oraz edytora przypadków uzycia (UC Workbench) został zaimple-mentowany prototyp narzedzia do automatycznej identyfikacji zdarzen. Narzedzie to zo-stało ocenione pod wzgledem efektywnosci oraz wydajnosci. Wyniki zostały porównane dowyników uzyskanych przez uczestników eksperymentów, którzy korzystali z podejscia adhoc oraz podejscia bazujacego na metodzie HAZOP. Do oceny wyników narzedzia oraz wy-ników przegladów wykonywanych recznie wykorzystany była benchmarkowa specyfikacjawymagan.
Wyniki: Praca przedstawia nastepujace wyniki:
• srednia efektywnosc identyfikacji zdarzen wykonywana podejsciem ad hoc wynosi0,18;
• srednia efektywnosc identyfikacji zdarzen wykonywana podejsciem opartym na me-todzie HAZOP wynosi 0,26; róznica efektywnosci miedzy podejsciem ad hoc a podej-sciem opartym na metodzie HAZOP jest znaczaca statystycznie;
• efektywnosc automatycznej metody identyfikacji zdarzen wynosi 0,8 (metoda auto-matyczna daje lepszy wynik niz metody przegladów recznych, róznica ta jest istotnastatystycznie);
• predkosc analizy przypadków uzycia, w celu identyfikacji zdarzen, przy wykorzysta-niu podejscia ad hoc wynosi 2,5 kroku na minute, a przy wykorzystaniu podejsciaopartego o metode HAZOP wynosi 0,5 kroku na minute;
• predkosc automatycznej analizy przypadków uzycia, w celu identyfikacji zdarzen,wynosi 10,8 kroków na minute;
• narzedzia NLP moga byc skutecznie wykorzystane do wykrywanie przykrych zapa-chów w przypadkach uzycia; przedstawione metody sa w stanie automatycznie wy-krywac 9 przykrych zapachów.
Konkluzje: Mozliwe jest zbudowanie komercjalnego narzedzia ukierunkowanego na auto-matyczna identyfikacje zdarzen w przypadkach uzycia. Narzedzie to mogłoby byc czesciainteligentnego edytora przypadków uzycia. Takie narzedzie wspierałoby iteracyjne podejsciedo tworzenia przypdków uzycia, przez co wpisywałoby sie w zwinne mtodyki wytwarzaniaoprogramowania.
Acknowledgments
This thesis would not have been possible without the help of numerous people. I
would like to mention some of them here. Without their help and support this thesis
would have never be written.
First of all, I would like to thank Professor Jerzy Nawrocki for showing and guid-
ing me through the exciting world of research problems and challenges. Without
his support and priceless feedback I could never have carried out the research pre-
sented in this thesis.
Many thanks also go to Miroslaw Ochodek, who, with his knowledge and expe-
rience, helped me at every stage of my research adventure.
I would also like to thank the rest of my colleagues from the Software Engineer-
ing group, with whom I had the pleasure to collaborate: Bartosz Alchimowicz, Sylwia
Kopczynska, Michal Mackowiak, Bartosz Walter, Adam Wojciechowski. They were
always eager to listen and help in dealing with the problems I encountered.
Experiments and case studies are a crucial part of this thesis. In real life it is
really hard to conduct experiments with an extended number of IT professionals,
however, there were companies which devoted the time of their employees for the
experimental tasks. Therefore, I would like to thank the following companies and or-
ganisations for their help: GlaxoSmithKline, Poznan Supercomputing and Network-
ing Center, Talex, Poznan University of Technology (especially people from the Es-
culap project) and PayU.
I am thankful to the Authorities of the Institute of Computing Science at Poz-
nan University of Technology for creating an environment in which I could study,
develop my research skills and learn from my colleagues.
I would also like to thank the Polish National Science Center for supporting my
research under the grant DEC-2011/01/N/ST6/06794.
Last but not least, I want to thank my wife, my parents, sister and friends for
their constant support and encouragement during my work on this thesis.
Jakub Jurkiewicz, Poznan, September 2013
List of journal papers by Jakub Jurkiewicz (sorted by 5-year impact factor):
Symbols:
IF = 5-year impact factor according to Journals Citations Report (JCR) 2012
Ranking = ISI ranking for the Software Engineering category according to JCR 2012
Scholar Citations = citations according to Google Scholar, visited on 2013-09-20
Ministry = points assigned by Polish Ministry of Sci. and Higher Education as of Dec. 2012.
1. J. Jurkiewicz, J. Nawrocki, M. Ochodek, T. Glowacki. HAZOP-based identifica-
tion of events in use cases. An empirical study, Empirical Software Engineering
DOI: 10.1007/s10664-013-9277-5 (available on-line but not assigned to an is-
sue yet)
IF: 1.756; Ranking: 20/105; Scholar Citations: 0; Ministry: 30.
2. M. Ochodek, B. Alchimowicz, J. Jurkiewicz, J. Nawrocki: Improving the relia-
bility of transaction identification in use cases, Information and Software Tech-
nology, 53(8), pp. 885 - 897, 2011.
IF: 1.692; Ranking: 24/105; Scholar Citations: 5; Ministry: 30.
3. B. Alchimowicz, J. Jurkiewicz, M. Ochodek, J. Nawrocki. Building benchmarks
for use cases. Computing and Informatics, 29(1), pp. 27 - 44, 2010.
IF: 0.305; Ranking: 104/115 1; Scholar Citations: 4; Ministry: 15
List of conference papers by Jakub Jurkiewicz (sorted by the Ministry points):
1. B. Alchimowicz, J. Jurkiewicz, M. Ochodek, J. Nawrocki. Towards Use-Cases
Benchmark, Lecture Notes in Computer Science. vol. 4980, pp. 20 - 33, 2011.
Scholar Citations: 1; Ministry: 10
2. A. Ciemniewska, J. Jurkiewicz, J. Nawrocki, L. Olek: Supporting Use-Case Re-
views, Lecture Notes in Computer Science vol. 4439, pp. 424 - 437, 2007.
Scholar Citations: 7; Ministry: 10
3. A. Gross, J. Jurkiewicz, J. Doerr, J. Nawrocki. Investigating the usefulness of no-
tations in the context of requirements engineering. 2nd International Workshop
on Empirical Requirements Engineering (EmpiRE), pp. 9 - 16, 2012.
Scholar Citations: 0; Ministry: 0
4. M. Ochodek, B. Alchimowicz, J. Jurkiewicz, and J. Nawrocki. Reliability of
transaction identification in use cases. Proceedings of the Workshop on Ad-
vances in Functional Size Measurement and Effort Estimation, pp. 51 - 58,
2010.
Scholar Citations: 4; Ministry: 0
1Category: Artificial Intelligence
Contents
List of Abbreviations IV
1 Introduction 1
1.1 Scenario-based requirements specifications and events identifica-
tion problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim and scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Use cases and their applications 5
2.1 Levels of use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Actors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Information Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Use Cases, Actors, Information Objects and Software Requirements
Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Good practices for writing use cases . . . . . . . . . . . . . . . . . . . 9
2.5.1 Specify the actors . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.2 Use simple grammar . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.3 Indicate who is the actor in a step . . . . . . . . . . . . . . . 10
2.5.4 Do not use conditional statements in a main scenario . . . 10
2.5.5 Do not write too short or too long scenarios . . . . . . . . . 11
2.5.6 Do not use technical jargon and GUI elements in steps . . . 11
2.5.7 Write complete list of possible events . . . . . . . . . . . . . 11
2.5.8 All specified events should be detectable . . . . . . . . . . . 11
2.6 Applications of use cases . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Basic NLP methods and tools 13
3.1 Sentence segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Word tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Part-of-speech tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
I
II
3.4 Lemmatization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.5 Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.6 Dependency relations tagging . . . . . . . . . . . . . . . . . . . . . . . 17
4 Simple methods of quality assurance for use cases 18
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Use Cases and Bad Smells . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Natural Language Processing Tools for English . . . . . . . . . . . . . 21
4.4 Defects in Use Cases and their Detection . . . . . . . . . . . . . . . . 23
4.4.1 Specification-Level Bad Smells . . . . . . . . . . . . . . . . . . 24
4.4.2 Use-Case Level Bad Smells . . . . . . . . . . . . . . . . . . . . 25
4.4.3 Step-Level Bad Smells . . . . . . . . . . . . . . . . . . . . . . . 27
4.5 Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5 Use-case-based benchmark specification 33
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 Benchmark-oriented model of use-case-based specification . . . . 35
5.3 Analysis of uc-based specifications structure . . . . . . . . . . . . . . 36
5.4 Analysis of use-cases-based specifications defects . . . . . . . . . . . 39
5.5 Building referential specification . . . . . . . . . . . . . . . . . . . . . 42
5.5.1 Specification domain and structure . . . . . . . . . . . . . . . 42
5.5.2 Use cases structure . . . . . . . . . . . . . . . . . . . . . . . . 43
5.6 Benchmarking use cases - case study . . . . . . . . . . . . . . . . . . 45
5.6.1 Quality analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.6.2 Time analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6 Identification of events in use cases performed by humans 50
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
6.2 An evolutionary approach to writing use cases . . . . . . . . . . . . . 52
6.2.1 Overview of use cases . . . . . . . . . . . . . . . . . . . . . . . 52
6.2.2 Process of use-case elicitation . . . . . . . . . . . . . . . . . . 53
6.2.3 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.3 HAZOP based use-case review . . . . . . . . . . . . . . . . . . . . . . 56
6.3.1 HAZOP in a nutshell . . . . . . . . . . . . . . . . . . . . . . . . 56
6.3.2 H4U: HAZOP for Use Cases . . . . . . . . . . . . . . . . . . . . 57
6.3.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.4 Experimental evaluation of individual reviews . . . . . . . . . . . . . 59
III
6.4.1 Empirical approximation of maximal set of events . . . . . . 60
6.4.2 Experiment 1: Student reviewers . . . . . . . . . . . . . . . . 63
6.4.3 Experiment 2: Experienced reviewers . . . . . . . . . . . . . . 70
6.5 Usage of HAZOP keywords . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.6 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7 Automatic identification of events in use cases 78
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.2 Use-case based requirements specification . . . . . . . . . . . . . . . 81
7.3 Inference engine for identification of events . . . . . . . . . . . . . . 82
7.3.1 The problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.3.2 Abstract model of a use-case step . . . . . . . . . . . . . . . . 82
7.3.3 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.4 Natural Language Processing of Use Cases . . . . . . . . . . . . . . . 91
7.4.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.4.2 NLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.5 Generation of Event Descriptions . . . . . . . . . . . . . . . . . . . . . 93
7.6 Empirical evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.6.1 Reference set of events . . . . . . . . . . . . . . . . . . . . . . 93
7.6.2 Sensible events . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.6.3 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.6.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.6.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.6.6 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.6.7 Turing test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.7 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8 Conclusions 103
A Material used in Turing test experiment 106
Bibliography 112
List of Abbreviations
A
AFP Activity Free Properties
API Application Programming Interface
ASP Activity Sensitive Properties
ATAM Architecture Tradeoff Analysis Method
ATM Automatic Teller Machine
B
BPM Business Process Modeling
BPMN Business Process Model and Notation
C
CRUD Create, Retrieve, Update, Delete
CRUD+SV Create, Retrieve, Update, Delete, Select, Validate
CYK Cockeoungerasami (CYK) algorithm
D
DBMS Database Management System
IV
List of Abbreviations V
E
ESE Empirical Software Engineering
F
FUR Functional User Requirements
G
GUI Graphical User Interface
H
H4U HAZOP for Use Cases
HAZOP Hazard and Operability Study
I
IT Information Technology
K
KAOS Knowledge Acquisition in automated specification
L
JAD Joint Application Development
N
NFR Non-functional Requirement
NLP Natural Language Processing
NLG Natural Language Generation
NRMSD Normalized Root Mean Squared Deviation
O
OMG Object Management Group
List of Abbreviations VI
P
PCFG Probabilistic Context Free Grammar
POS Part of Speech
S
SD Standard Deviation
SDS Software Development Studio
SE Software Engineering
SLIM Software Life-Cycle Management
SRS Software Requirements Specification
SuD System under Development (Design, or Discussion)
SVO Subject Verb Object
T
TBD To Be Determined
TTPoints Transaction-Typed Points
U
UC Use Case
UCDB Use Cases Database
UCP Use Case Points
UML Unified Modeling Language™
V
VDM Vienna Development Method
Chapter 1
Introduction
1.1 Scenario-based requirements specifications and events
identification problem
Requirements elicitation is one of the activities of software development. Require-
ments are of two kinds: functional and nonfunctional. The latter describe what a
system is able to do, whereas the former present design constraints and quality ex-
pectations concerning time behavior, security, portability, etc.
The role of software requirements cannot be overrated. The data presented by
a Standish Group report [7] show that about 20% of IT projects never finish suc-
cessfully and almost 50% face problems with on-time delivery and rising costs. Ac-
cording to the report, poor quality of requirements and creeping scope are two of
the most important factors contributing to project failures. This observation is sup-
ported by many other researchers [19, 68, 71, 105, 106, 116, 119, 124]. Niazi et al. [93]
present data based on interviews with IT specialists. Vague, inconsistent and incom-
plete requirements were the common answer to the question about the reason for
project failures. Another well known phenomenon is defect slippage between soft-
ware development phases. According to HP [50], fixing a defect in the requirements
elicitation phase costs 3 times less than fixing the same defect in the design phase
and 6 times less than in the coding phase. Thus, quality of requirements is an im-
portant issue.
One of the ways of expressing requirements is through scenarios. Glinz [47] de-
fines a scenario in the following way:
“A scenario is an ordered set of interactions between partners, usually
between a system and a set of actors external to the system."
According to Neil et al., scenarios are used in over 50% of software projects.
There are two types of scenarios: narrative and dialog-like. Narrative scenarios
1
1.1. Scenario-based requirements specifications and events identification problem 2
describe usage of the system in real-life situations [26] and they are used to present
a vision of the proposed system. They use story-telling techniques like mentioning
the settings of the scenario, introducing agents and using plot to present actions
[25]. Narrative scenarios are written without an imposed structure. As they describe
the system at a very high abstraction level and neglect many (sometimes quite im-
portant) details, they cannot be easily used by programmers.
The best known form of dialog scenarios are use cases, introduced by Jacobson
[59], and elaborated on by Cockburn [31]. Use cases have a structure consisting
of three main parts: a name expressing the goal of interaction, a main scenario in
the form of a sequence of steps (each step is an action taken by the system or a
human), and exceptions describing the possible deviations from the main scenario.
These kinds of scenarios allow for more detailed description of the requirements,
hence they better suited to the next software development phases (e.g. architectural
design, coding, black-box testing).
Use cases fit both classical and agile approaches to software development very
well. Use case names correspond to user stories used to describe the software prod-
uct (they would form the Product Backlog [107]). Frequently this level of abstraction
is too high and, for instance one of the Scrum practices is Product Backlog refine-
ment/grooming in which user stories are further elaborated into smaller ones or
tasks. The result of this work could be expressed in a form identical to the main sce-
nario of a use case (i.e. as a sequence of actions). If needed such dialog could be
even further elaborated upon by identifying events and actions taken in response to
their occurrence.
Identifying events and alternative scenarios corresponding to them is quite im-
portant for making the right architectural decisions. For instance, in the ATAM method
of software architecture evaluation, one of the activities, being part of the method,
is identifying alternative scenarios [70].
Unfortunately, there is no research concerning the effectiveness of events iden-
tification carried out by software experts and very little has been done to help peo-
ple do this effectively. Due to our observations, events are identified using the ad
hoc approach (the identification is done by the Analyst alone or during JAD ses-
sions [123]). Even the ATAM method does not suggest a particular way of identifying
events [70]. It is also not known if events identification could be done in an auto-
matic way and what the effectiveness of such an approach would be. Sutcliffe et al.
[115] describe a tool supporting the Analyst in identifying events, but it is restricted
only to events associated with violating non-functional requirements, and it requires
substantial input from an expert.
1.2. Aim and scope 3
1.2 Aim and scope
The overall aim of this thesis is to answer the following research questions:
• Is it possible to automate identification of exceptional situations (i.e. events)
in use cases ?
• What is the effectiveness of such an approach?
In order to answer those questions, this thesis should provide the following:
• A method for automatic identification of use-case events.
• Empirical evaluation of the effectiveness and efficiency of the proposed method
for automatic identification of events in use cases.
• Comparison of the effectiveness and efficiency of the proposed automatic method
and the method of manual reviews aimed at identification of use-case events.
Use cases are written using natural language, hence, the proposed automatic
method would have to be able to analyse use cases using Natural Language Pro-
cessing (NLP) techniques and tools. It is assumed that this thesis is not aimed at
the development of these kinds of methods, however, existing NLP tools would be
used in order to analyse use cases. Moreover, it is assumed that use cases written in
English will be analysed, so NLP tool aimed at analysis of English texts will be used.
Some parts of this thesis are based on already published papers or reports. Every
chapter which is based on published material is annotated with appropriate infor-
mation and with a structured abstract showing how the chapter contributes to the
thesis.
As use cases are the main concept used in this thesis, they and their applica-
tion are described in Chapter 2. A set of good practices and patterns for authoring
use cases is also presented in that chapter. This thesis uses existing NLP methods
and tools, hence, they are described in Chapter 3. The initial research related to
this thesis was about the creation of simple methods for improving the quality of
use cases. These methods and their evaluation are presented in Chapter 4. Analysis
of the existing use-case-based requirements specification showed that the analysed
use-cases were of poor quality. The significant problem was related to the missing
events, so the idea of automatic events identification was born. However, in order to
be able to evaluate the automatic method, how IT professionals identify events had
to be analysed. When preparing an experiment with IT professionals, it appeared
that there is no benchmark use-case-based requirement specification, which could
be used to evaluate methods and tools targeted at analysis of use cases. Therefore,
this kind of benchmark specification was created. The process of its creation of it
is presented in Chapter 5. Chapter 6 contains a description of the HAZOP-based
method for manual identification of use-case events. That chapter is supplemented
1.3. Conventions 4
with empirical evaluation (with the use of the benchmark specification) of review
speed and accuracy of the method. The most significant contribution of this thesis
is presented in Chapter 7, which describes the method of automatic identification of
events in use-cases. The method is accompanied by the evaluation of the effective-
ness and efficiency of the proposed approach. Chapter 8 recapitulates the results of
the thesis, and discusses possible future improvements and extensions of the pro-
posed approaches.
1.3 Conventions
Several typographical conventions are used in the thesis. The meaning of particular
mathematical symbols and variables is explained in the place where they are used
for the first time.
new term Key concepts, terms and quotes from other authors.
KEY TERM Notes about important concepts and terms appearing in the correspond-
ing paragraph.
[12] References to books, papers, and other sources of information.
Many people contributed to the results presented in this thesis. To emphasise
this, the pronoun “we” will be used throughout the thesis. However, despite the
help and inspiration I received from other people, I am taking full responsibility for
the presented research and the results described in this thesis.
Chapter 2
Use cases and their applications
There are numerous techniques and methods for specifying functional requirements.
They include system shall statements [55], user stories [33], formal methods (like APPROACHES TO
REQUIREMENTS
SPECIFICATIONVDM [23] and Z notation [58]), goal oriented notations (e.g. i* notation [125]) and
use cases [60]. Every method has its advantages and disadvantages, and it is aimed
for different types of software systems and software development process. Regard-
less of the diversity of notations, use cases and scenarios-based approaches remain
the most popular methods (according to Neill et al. [92] scenarios and use cases are
used in over 50% of software projects).
Use cases were introduced by Ivar Jacobson in the 80’s of XX century and they
were fully described in 1992 [61] as a part of object-oriented approach for software
modelling. Use cases come in two supplementary forms: text-based and diagram-
based. The former is a part of Unified Modeling Language ™(UML) [98]. Use-case USE CASE
DIAGRAMSdiagram consists of four main parts: actor specification, use-case name (actor’s goal)
specification, relation between the actor an the use case, system boundary. Figure
2.1 presents simple use-case diagram with the mentioned elements.
Customer
Withdraw cash
ATM
1 2
3
4
Figure 2.1: An example of a use-case diagram (1 - actor’s name; 2 - an ellipse holding the name ofa use case; 3 - association between the use case and the actor; 4 - boundary of the system beingbuilt.
5
2.1. Levels of use cases 6
Use-case diagrams present only the high level information (mainly actors’ goals) TEXT-BASED
USE CASESabout functional requirements, hence, text-based use cases are needed to present
the details of the requirements. Textual use cases consist of three main parts:
• use-case name describing actor’s goal;
• main scenario presenting interaction between actors, this interaction should
lead to achieving the given goal;
• section containing alternative flows, extensions and exceptions - this section
describes other ways for achieving the goal, extensions to the main scenario
and events which may interrupt the main scenario.
An example of text-based use case is presented in Figure 2.2.
This thesis focuses on the events. An exemplary event is labeled with number
5 in Figure 2.2. It can be noticed that the main scenario describing withdraw of
cash can be interrupted by different events, one of which is incorrect PIN entered
by the customer. Events can be supplemented with the scenarios describing how
the system should handle this abnormal situation. The information about events is
not present in use-case diagrams, hence, this thesis will focus only on text-base use
cases.
It can be noticed that use cases are written in natural language. This makes use
cases easier to understand by IT layman, but makes it harder to analyse in an auto-
mated way. There are not particular syntax rules for use cases, however, some good
practices for writing use cases were proposed [9, 31]. Following these guidelines can
help in achieving high quality use cases and automatic analysis of them. Examples
of good practices for authoring use cases are presented in Section 2.5.
2.1 Levels of use cases
Functional requirements can be described at different levels of abstraction. Some LEVELS OF
USE CASEreaders of SRS can be interested in actor’s goals, other may be interested in the de-
tails of the interactions which lead to achieving the goals, other can be curious about
the low level operations the system is supposed to perform. In order to cover these
different perspectives the concept of levels of use cases was introduced. Cockburn
[31] suggests three main levels for use cases:
• summary level (also called business level) is used to describe interactions be-
tween actors which lead to achieving high level business goals. Use cases at
this level often describe business processes inside a particular company. Per-
forming scenarios from use cases at the summary level can take days, months
or even years.
2.2. Actors 7
ID: UC1 Name: Withdraw cash Actor: Customer Main scenario: 1. Customer inserts the card. 2. ATM verifies the card. 3. Customer enters the PIN number. 4. ATM verifies the entered PIN number. 5. ATM asks for the amount. 6. Customer enters the amount. 7. ATM changes the balance on customer's account. 8. ATM provides the cash and the card. 9. Customer collects the money and the card.
Alternative flows, extensions, exceptions:
4.A. The PIN number is incorrect 4.A.1. ATM informs about the problem.
4.A.2. ATM returns the card. 4.A.3. Customer collects the card. 9.A. Customer does not collect the money in 30 seconds. 9.A.1. ATM takes the money back. 9.A.2. ATM changes the balance on customer's account.
1
2
3
45
Figure 2.2: An example of a text-base use case (1 - use-case name; 2 - main scenario; 3 - sectionwith alternative flows, extensions to the main scenario and exceptions; 4 - an example of excep-tion consisting of an event and the scenario; 5 - event which may interrupt the main scenario.
• user level is used to describe how particular user (actor) goals are achieved.
Use case from Figure 2.2 is an example of a use case at the user level. Per-
forming scenarios from use cases at this level should take no more than few
minutes (one session of interaction between user and the system).
• subfunction level (also called system level) is used to describe in details opera-
tions which lead to achieving user goals.
Most of the use cases are written on the user level as this level provides most
important information about how user goals should be achieved. Therefore, this
thesis focuses on user level of use cases.
2.2 Actors
Different kind of actors can take part in a use case. Most commonly, actor represents ACTORS
an end user, who will interact with the system being built. Customer, Accountant,
Student are names of exemplary actors representing potential end users. Another
kind of actors are systems and devices, which interact with the system being built.
2.3. Information Objects 8
Usually these actors are helping in achieving users’ goals. Examples of this kind of
actors might include: banking system, hotel booking system, geolocation device.
Actor taking part in a use case can be a primary actor or a secondary actor. Pri- TYPES OF ACTORS
mary actor of a use case is the actor whose goal is being achieved in the given use
case. Secondary actors are actors which support the system and the primary actor
in achieving the goal described in the use case. Actors can act as different types of
actors in different use cases, i.e. an actor can act as a primary actor in one use case
and as secondary actor in another use case.
Additional actor taking part in use cases is the system being built, often called
also system under development (SuD).
2.3 Information Objects
Systems described in a requirements specification can operate in different business
domains, e.g. health care, banking, education, gastronomy. Every domain uses dif-
ferent terminology, types of documents and other objects holding information. In
order to understand given domain, Analyst needs to describe the domain in order
to create shared view of it [103]. This can be achieved by creating a model of a do-
main. The simplest approach to this is creating a list of information objects (also
called domain objects or business object [97]) with a name and a short description
of every of them. Information objects can hold a collection of different pieces of
information, so additionally, every object can have the data it holds specified (e.g.
information object patient data can hold information about patient’s name, health
insurance number, personal id number). If required, domain model can be mod-
eled using UML class diagrams [98], however, in this thesis only textual description
of information objects is used.
2.4 Use Cases, Actors, Information Objects and Software
Requirements Specification
One could wonder how use cases, actors and information objects are incorporated
into an SRS. Figure 2.3 presents an exemplary SRS template [66] (it is based on
the IEEE 830-1998 [55] and ISO-9126 [57] standards and suggestions from Wiegers
[121]). This template suggests clear definition of actors (in Chapter 3.1) and infor-
mation objects (in Chapter 3.2) and specification of use cases for every functional
module.
2.5. Good practices for writing use cases 9
Software Requirements Specification
1. Introduction 1.1. Purpose 1.1. Document Conventions 1.1. Product Scope 1.1. References2. Overall Description 2.1. Product Perspective 2.2. Product Features 2.3. Constraints 2.4. User Documentation 2.5. Assumptions and Dependencies3. Business Process Model 3.1. Actors and User Characteristics 3.2. Business Objects 3.3. Business Processes 3.4. Business Rules4. Functional Requirements 4.x. Functional Module X 4.x.1. Description and Priority 4.x.2. Use Cases 4.x.3. Specific Functional Requirements5. Interface Characteristics 5.1. User Interfaces 5.2. External Interfaces 5.2.1. Hardware Interfaces 5.2.2. Software Interfaces 5.2.3. Communications Interfaces6. Non-functional Requirements 7. Other RequirementsAppendix A: GlossaryAppendix B: Data Dictionary Appendix C: Analysis Models Appendix D: Issues List
Figure 2.3: An SRS template with marked chapters related to use-cases, actors and informationobjects descriptions.
2.5 Good practices for writing use cases
Use cases are written in natural language, hence, they are exposed to a number of PROBLEMS WITH
USE CASESpossible quality issues. One of them is ambiguity (e.g. lexical, syntactical, referential
or ellipsis). Every natural language has a set of homonyms, i.e. words which have
more than one meaning, so using a homonym word in a use-case step might lead
to different interpretations of the step. Moreover, the structure of a sentence might
allow different understanding of it. Similarly, incomplete sentences can lead to mis-
understandings. Another problem related to using natural language is subjectivity
of phrases, e.g. step Customer collects the money quickly leaves a broad space for
2.5. Good practices for writing use cases 10
interpretations as quickly might mean different periods of time for different readers.
Additional problem is linked to the lack of precision, e.g. step Student provides the
data does not specify exactly what data is provided and this may lead to confusion.
All these (and many others) problems were observed by practitioners and led to GOOD PRACTICES
creation of guidelines and patterns for authoring use cases. Below is the list of the
good practices which are mentioned in this thesis.
2.5.1 Specify the actors
It may seem natural to move straight to writing use-cases instead of focusing of
specification of actors first. Lack of definition of some of the actors might lead to
missing some of the users goals which might result in omitting important system
behaviour. Therefore Adolph et al. [9] suggest ClearCastOfCharacters pattern saying
Identify the actors and the role each plays with respect to the system. Clearly describe
each of actors the system must interact with.
2.5.2 Use simple grammar
Cockburn [31] says that The sentence structure should be absurdly simple and it should
use grammatical structure of Subject + Verb + Object. Missing any of the parts of this
structure might make it hard to understand, while adding additional elements to
this structure might lead to misinterpretation.
2.5.3 Indicate who is the actor in a step
It should be always clear who performs the action in the step. Actor should be the
subject of the sentence. Omitting actor might lead to different interpretations, e.g.
step Send the message does not indicate which actor performs the sending operation
- is it the main actor from the use case, some secondary actor or maybe the system
being built? Writing System sends the message makes it clear which actor performs
the operation.
2.5.4 Do not use conditional statements in a main scenario
Cockburn [31] argues that using conditional statements in main scenarios does not
move the process forward and it should be avoided. Step If the password is correct,
the System presents available invoices is an example of a sentence with conditional
statement. Cockburn suggests to use validate phrase in this cases, e.g. The system
validates that the password is correct. In this case, so called happy day scenario is
presented in the main scenario and all the failures and unexpected situations can
2.5. Good practices for writing use cases 11
be moved to the Exceptions section of the use case. This approach is also advised
by Adolph et al. [9] in their ScenarioPlusFragments pattern, saying Write success sto-
ryline as a simple scenario without any consideration to possible failures. Below it,
place story fragments that show what alternative condition might have occurred.
2.5.5 Do not write too short or too long scenarios
It is advised to write main scenarios with 3-9 steps. Too short main scenarios often
do not describe any valuable goals and too long scenarios make it hard to read and
understand the interaction. This good practice is supported by EverUnfoldingStory
pattern from Adolph et al. [9]. The pattern states Organize the use case set as a
hierarchical story which can be ufolded to get more detail, or folded to hide detail
and get more context. This pattern counteracts too long scenarios as it advices using
use cases at different levels instead of writing all the information in use cases at one
level.
2.5.6 Do not use technical jargon and GUI elements in steps
If Analyst has technical background it is often natural for her/him to use techni-
cal jargon and names of Graphical User Interface (GUI) elements in use-case steps.
However, Cockburn [31] reminds Verify that the step you just wrote captures the real
intent of the actor, not just the movements in manipulating the user interface.
2.5.7 Write complete list of possible events
Focusing only on main scenario, which presents the most common interaction be-
tween the actors, may lead to missing additional (not typical) system behaviour.
Therefore, Adolph et al. [9] proposed ExhaustiveAlternatives pattern. The pattern
states Capture all failure and alternative that must be handled in the use case. They
also ephasize that Error handling and exception processing comprise the bulk of most
software-based systems and Having information about the variations helps develop-
ers build a robust design.
2.5.8 All specified events should be detectable
If the specified use-case event cannot be detected by a system, the scenario which
should handle this event will never be executed, therefore, it is not necessary to
specify this scenario and this event. For example in the use case from Figure 2.2
event 9.A. could be written as Customer bent down to tie his/her shoelace. The ques-
tion arises if the ATM is able to detect this particular event? Probably no, hence it is
2.6. Applications of use cases 12
better to write the event the way it is stated in Figure 2.2: Customer does not collect
the money in 30 seconds. This event can be detected by the ATM and can be han-
dled as it is stated in the use case. This kind of issues are discussed in Detectable-
Conditions pattern from Adolph et al. [9]. The pattern states Include only detectable
conditions. Merge conditions that have the same net effect on the system.
2.6 Applications of use cases
One of the reasons why use cases have become popular is their possible applications
in other software development phases and activities. Once use cases are written
down, analysed and discussed with customer and other team members, they can be
used for:
• effort estimation - one of the main methods for effort estimation is based on
use cases. The method is called Use Case Points (UCP) and it was proposed by
Karner [69]. UCP takes into account complexity of the specified actors, com-
plexity of the use cases and a number of technical and environmental fac-
tors. Based on this use case points for the given project are calculated. These
points, together with the data about the previous projects, can be used to es-
timate the effort for developing the project. Another method for effort estima-
tion using use cases is called TTPoint and it was proposed by Ochodek [94].
This method takes into account the specification of information objects and
different types of transactions which can be found in use cases.
• prototyping - use cases can be integrated with the prototyping activities. UC Work-
bench tool described by Nawrocki et al. [88] allows for attaching User Interface
(UI) skatches to use-case steps. Based on this, a mockup can be generated for
easy analysis of the requirements and the sketched. This approach was em-
pirically validated by Olek et al. [96] with the conclusion that use cases and UI
sketches can be successfully used for prototyping of web applications.
• testing - use-case scenarios can be used as acceptance-tests scenarios. This
approach is advocated by McBreen [85] and validated empirically by Roubtsov
et al. [104]. Moreover, Olek [95] proposed an approach which allows for auto-
matic generation of acceptance tests from use cases.
This diversity of applications of use cases shows that use cases can play major
role in software development, hence, it is important to assure high quality of use
cases. This can be done manually using reviews and the mentioned good practices,
and automatically with appropriate tools. Both approaches, manual and automatic
for improving quality of use cases are presented in this thesis.
Chapter 3
Basic NLP methods and tools
Survey conducted by Mich et al. [81] shows that natural language is used in major-
ity of requirements specifications. Documents based on use cases are not different,
which means that they are often ambiguous and vague and this can result in mis-
understandings and communication problems between stakeholders.
Participants of the mentioned survey were asked What would be the most use-
ful thing to improve general day-to-day efficiency?. Most of them (64 %) answered
with automation. This shows that there is a strong need for tools which are able
to analyse and process requirements in an automatic way. A number of tools [42,
49, 77, 91, 126] were proposed for automated processing of requirements, however,
most of them are designed for formal or language-restricted methods for require-
ments modeling. Taking into account that majority of requirements are written in
natural language, there is a demand of tools which can process natural-language-
based requirements. This can be done, to some extend, using existing Natural Lan-
guage Processing (NLP) tools. This thesis uses two NLP tools: Stanford Parser [5]
and OpenNLP [1] library, which provide basic operations helping in analysis of nat-
ural language sentences. The reminder of this chapter presents the most important
parts of NLP tools required for effective and efficient processing of use-case steps.
3.1 Sentence segmentation
It is used for distinguishing sentences from larger texts. In most written natural lan-
guages sentences are separated with punctuation marks, however, some of these
punctuation marks can be used also for other purposed, e.g. a dot can be used
in abbreviations like Dr. or i.e.. There are two main approaches for the problems
of sentence segmentation: traditional regular-expression-based (e.g. Alembic infor-
13
3.2. Word tagging 14
mation extraction system [8]) or trainable one (e.g. algorithm using regression trees
by Breiman et al. [24]). Sentence detector from OpenNLP used in this thesis uses
the second approach. It may seem that sentence segmentation is not necessary for
analysis of use-case step as the steps consist of one sentence. In fact, there is no for-
mal requirement for the steps to contain only one sentence and Analyst can write as
many sentences in one step as he/she wants. Sentence segmentation should work
as follows: for the input
System filters the data according to the given criteria. System saves the filtered data.
System displays confirmation.
it should return the following output:
• Sentence 1: System filters the data according to the given criteria.
• Sentence 2: System saves the filtered data.
• Sentence 3: System displays confirmation.
3.2 Word tagging
This phase is used for distinguishing separate words from a sentence (use-case step).
In case of space-delimited languages, like English, words are separated with white
spaces, which makes word tagging simpler than for unsegmented languages (like
Chinese), however, there are still challenges to overcome. One of the challenges is
related to punctuation marks, like periods, commas or apostrophes. These elements
can have different meanings and can change function of a word. Another challenge
is linked to the multi-part words and multi-word expressions. These constructions
are often used in strongly agglutinative languages (like Turkish or Swahili) but they
can also be found in German (in noun-noun or adverb-noun constructions) and
even in English (usually with use of a hyphen, e.g. Poznan-based). A common ap-
proach for word tagging is called maximum matching algorithm which uses a list
of words for the language being analysed. Based on the list the algorithm is able to
find separate words in the given text. Stanford Tokenizer [4] was used in this thesis
for word tagging. Word tagging should work as follows: for the input
System displays confirmation.
it should return the following list of tagged words
• Word 1: System
• Word 2: displays
• Word 3: confirmation
3.3. Part-of-speech tagging 15
3.3 Part-of-speech tagging
The task of part-of-speech (POS) tagger is to assign part of speech (e.g. noun, verb,
adjective) to every word in a sentence. The problem with part-of-speech tagging is
that different words can be a different parts of speech, depending on the context of
the sentence. For example word displays in the following sentences can be a verb or
a noun:
• verb example: System displays confirmation.
• noun example: The confirmation in presented on displays.
According to DeRose report [40], over 40% of English word types are ambiguous. The
task of part-of-speech tagging is to resolve this ambiguities, assigning the proper
part-of-speech to the given word. There are two main approaches to part-of-speech
tagging: rule-based and stochastic [64]. The former requires a large set of hand-
written disambiguation rules, while the latter use training corpus to compute prob-
ability of a given word being given part-of-speech. In this thesis Stanford Part-Of-
Speech Tagger [3] is used, which uses the probabilistic approach with maximum
entropy models [20] in order to assign parts-of-speech to words. The POS tagger
should work as follows: for the input
System displays confirmation.
it should return the following list of parts-of-speech:
• System: noun
• displays: verb
• confirmation : noun
3.4 Lemmatization
The goal of this phase is to find a common base form (called lemma) of the given
word. Jurafsky et al. [64] defines lemma as:
“a set of lexical form having the same stem, the same major part-of-
speech, and the same word-sense.”
This task is challenging because of many numbers of possible forms of a given word
(e.g. book, books, books’ book’s have the same lemma book and am, are, is have
the same lemma be). The lemmatizer from the morphology component of Stanford
Parser [5] is used in this thesis for the task of Lemmatization. The lemmatizer should
work as follows: for the input
System displays user’s data.
3.5. Parsing 16
it should return the following output
• for the word System the lemma is system
• for the word displays the lemma is display
• for the word user’s the lemma is user
• for the word data the lemma is data
3.5 Parsing
A general definition of parsing provided by Jurafsky et al. [64] is following:
“Parsing means taking an input and producing some sort of structure
for it.”
In case of natural-language-parsing, a sentence is an input and a parse tree is the
output. Parsing natural language is much more challenging than parsing any for-
mal language because usually several parse trees are possible for the same sen-
tence. There are two main strategies for parsing: top-down (also called goal-directed
search) and bottom-up (called data-directed search). Stanford Parser [72] uses prob-
abilistic methods for creation of parse trees based on the CYK algorithm (using bottom-
up approach) [64]. In order to use this approach Probabilistic Context Free Grammar
is assumed (PCFG). The output from Stanford Parser for the sentence
System filters the data according to the given criteria.
is presented in Figure 3.1.
Parse tree:
(ROOT (S (NP (NNP System)) (VP (VBZ filters) (NP (DT the) (NNS data)) (PP (VBG according) (PP (TO to) (NP (DT the) (VBN given) (NNS criteria))))) (. .)))
Figure 3.1: An example of parse tree generated by Stanford parser generated by Stanford Parserfor the sentence System filters the data according to the given criteria.
3.6. Dependency relations tagging 17
3.6 Dependency relations tagging
This phase provides a representation of dependency relations between words in a
given sentence. These relations allow for distinguishing main parts of the sentence,
like subject, predicate and object. The types of relations used in Stanford Parser are
described by Marnee et al. [36]. The grammatical relations tagger from Stanford
Parser [37] works as follows: for the input
System filters the data according to the given criteria.
it returns the grammatical relations as presented in Figure 3.2. From these relations
and taking into account parts of speech of every word and the parse tree, the follow-
ing main elements can be distinguished from the given example:
• subject: system
• predicate: filter
• object: data
Typed dependencies:
nsubj(filters-2, System-1)root(ROOT-0, filters-2)det(data-4, the-3)dobj(filters-2, data-4)prepc_according_to(filters-2, to-6)det(criteria-9, the-7)amod(criteria-9, given-8)pobj(filters-2, criteria-9)
Figure 3.2: An example of grammatical relations generated by Stanford Parser for the sentenceSystem filters the data according to the given criteria.
Chapter 4
Simple methods of quality
assurance for use cases
Preface
This chapter contains the paper: Alicja Ciemniewska, Jakub Jurkiewicz, Jerzy
Nawrocki and Łukasz Olek, Supporting Use-Case Reviews, 10th International Confer-
ence on Business Information Systems, Lectures Notes in Computer Science, Vol 4439,
(2007).
Deeper analysis of the results of the proposed methods are presented in Chapter
5 in Section 5.6.
My contribution to this paper included the following tasks: (1) analysis of avail-
able NLP tools; (2) development and implementation of the methods which were
able to detect 5 out of 9 bad smells 1 ; (3) evaluation of the proposed methods with 2
real-life projects; (4) implementation of NLP parsers with the use of Stanford Parser
and OpenNLP tools.
Context: The research was inspired by the book Patterns for Effective Use Cases by
Adolph et al. , which contained some recommendations for writing effective use
cases. We considered violations of those recommendations as bad smells in use
cases and started to think which of them could be detected in an automatic way.
That research was my first step on the way to applying NLP techniques for improve-
ment of the quality of use cases which later allowed the stating and solving of prob-
lems presented in Chapters 6 and 7.
Objective: The goal of this Chapter is to propose NLP-based methods aimed at au-
1Duplicated use cases, Complicated Extensions, Repeated actions in neighbouring steps, Lack ofthe Actor, Misusing Tenses and Verb Forms
18
4.1. Introduction 19
tomatic detection of bad smells is use cases.
Method: First, the patterns proposed by Adolph et al. in their book were analysed
to decide for which of them violations to the patterns, called bad smells, can be
detected in an automatic way. Next, for each of the chosen bad smells an auto-
matic method of its identification was proposed and implemented with the help
of NLP tools (Stanford Parser, OpenNLP). The usefulness of the proposed approach
was evaluated and described in another paper, see Chapter 5.
Results: We managed to propose methods which were able to automatically identify
the following bad smells in use cases:
• Duplicated use cases
• Lack of the Actor
• Too long or too short use cases.
• Inconsistent style of naming
• Too Complex Sentence Structure
• Misusing tenses and verb forms
• Using Technical Jargon
• Conditional Steps
• Complicated Extensions
• Repeated actions in neighbouring steps.
Conclusion: Knight et al. in their paper “An improved inspection technique.” (Com-
munications of the ACM, 1993) proposed phased reviews, where initial reviews are
quite simple (similar to identifying easy-to-discover bad smells) and can be done by
a secretary or a junior developer. The methods proposed in the chapter show that
such reviews, when applied to use cases, could be done by a computer instead of a
human, and could be a useful add-on to a use-case editor.
4.1 Introduction
Use cases have been invented by Ivar Jacobson [60]. They are used to describe func-
tional requirements of information systems in a natural language ([9], [31], [61],
[34]). They are getting more and more popular. They are extensively used in var-
ious software development methodologies, including Rational Unified Process [74]
and XPrince [90].
Quality of software requirements, described as use cases or in any other form,
is very important. The later the defect is detected, the more money it will cost. Ac-
cording to Pressman [99], correcting a defect in requirements at the time of coding
costs 10 times more than correcting the same defect immediately, i.e. at the time
4.2. Use Cases and Bad Smells 20
of requirements specification. Thus, one needs quality assurance. As requirements
cannot be tested, the only method is requirements review. During review a soft-
ware requirements document is presented to interested parties (including users and
members of the project team) for comment or approval [54]. The defects detected
during review can be minor (not so important and usually easy to detect) or ma-
jor (important but usually difficult to find). It has been noticed that if a document
contains many easy-to-detect defects then review is less effective in detecting ma-
jor defects. Thus, some authors proposed to split reviews into two stages (Knight
calls them "phases" [73], Adolph uses term "tier" [9]): the first stage would concen-
trate on detecting easy-to-detect defects (e.g. compliance of a document with the
required format, spelling, grammar etc.) and the aim of the second stage would be
to find major defects. The first stage could be performed by a junior engineer or
even by a secretary, while the second stage would require experts.
In the paper a mechanism for supporting use-case reviews, based on natural
language processing tools, is presented. Its aim is to find easy-to-detect defects au-
tomatically, including use-case duplication in the document, inconsistent style of
naming use cases, too complex sentence structure in use cases etc. That will save
time and effort required to perform this task by a human. Moreover, when integrat-
ing this mechanism with a requirements management tool, such as UC Workbench
developed at the Poznan University of Technology ([87], [89]), one can get instant
feedback on simple defects. That can help to learn good practices in requirements
engineering and it can help to obtain a more ’homogeneous’ document (that is im-
portant when requirements are collected by many analysts in a short time).
The next section describes two main concepts used in the paper: use case and
bad smell (a bad smell is a probable defect). Capabilities of natural language pro-
cessing tools are presented in Section 3. Section 4 describes defects in use cases that
can be detected automatically. There are three categories of such defects: concern-
ing the whole document (e.g. use-case duplication), injected into a single use case
(e.g. an actor appearing in a use case is not described), concerning a single step in a
use case (e.g. too complex structure of a sentence describing the step). A case study
showing results of application of the proposed method to use cases written by 4th
year students is presented in Section 5. The last section contains conclusions.
4.2 Use Cases and Bad Smells
Use cases are getting more and more popular way of describing functional require- BAD SMELLS
ments ([9], [31], [62]). In this approach, scenarios pointing up interaction between
users and the system are presented using a natural language. Use cases can be writ-
4.3. Natural Language Processing Tools for English 21
Figure 4.1: An example of a use case in a structured form.
ten in various forms. The most popular are ’structured’ use cases. An example of
structured use case is presented in Figure 4.1. It consists of the main scenario and
a number of extensions. The main scenario is a sequence of steps. Each step de-
scribes an action performed by a user or by a system. Each extension contains an
event that can appear in a given step (for instance, event 3.A can happen in Step 3
of the main scenario), and it presents an alternative sequence of steps (actions) that
are performed when the event appears.
The term ’bad smell’ was introduced by Kent Beck and it was related to source
code ([44]). A bad smell is a surface indication that usually corresponds to a deeper
problem in the code. Detection of a bad smell does not have to mean, that the code
is incorrect; it should be rather consider as a symptom of low readability which may
lead to a serious problem in the future.
In this paper, a bad smell is a probable defect in requirements. Since we are
talking about requirements written in a natural language, in many situations it is
difficult (or even impossible) to say definitely if a suspected defect is present or not
in the document. For instance, it is very difficult to say if a given use case is a dupli-
cation of another use case, especially if the two use cases have been written by two
people (such use cases could contain unimportant differences). Another example
is complexity of sentence structure. If a bad smell is detected, a system displays a
warning and the final decision is up to a human.
4.3 Natural Language Processing Tools for English
Among many other natural language processing tools, the Stanford parser [10] is NLP TOOLS
the most powerful and useful tool from our point of view. The parser has a lot of
4.3. Natural Language Processing Tools for English 22
Figure 4.2: An example of a Combined Structure generated by the Stanford parser.
capabilities and generates three lexical structures:
• probabilistic context free grammar (PCFG) structure - is a context-free gram-
mar in which each production is augmented with a probability
• dependency structure - represents dependencies between words
• combined structure - is a lexicalized phrase-structure, which carries both cate-
gory and (part-of-speech tagged) head word information at each node (Figure
4.2)
The combined structure is the most informative and useful. To tag words it uses
Penn Treebank set. As an example in Figure 4.2 we present structure of a sentence
"User enters input data". In the example the following notation is used: S - sentence,
NP - noun phrase , VP - verb phrase, NN - noun, NNP - singular proper noun, NNS
- plural common non, VBZ - verb in third person singular.
Moreover, the Stanford parser generates grammatical structures that represents
relations between individual pairs of words. Below in Figure 4.3 we present typed
dependencies for the same example sentence, as in the above example. Notation
used to describe grammatical relations are presented in [38].
The ambiguity of natural language and probabilistic approach used by the Stan-
ford parser cause problems with automatic language analysis. During our research
we have encountered the following problem. One of the features of the English lan-
guage is that there are some words which can be both verbs and nouns. Addition-
ally the third person singular form is composed by adding "s" to the end of the verb
base form. In a similar way the plural form of a noun is built. This leads to the situ-
ation when the verb is confused with the noun. For example word "displays" can be
tagged as a verb or a noun. Such confusion may have great influence on the further
analysis. An example of such situation is presented in Figure 4.4.
4.4. Defects in Use Cases and their Detection 23
Figure 4.3: An example of a typed dependencies generated by the Stanford parser.
Figure 4.4: An example of an incorrect sentence decomposition.
In our approach we want to give some suggestions to the analyst about how
the quality of use cases can be improved. When the analyst gets the information
about the bad smells, he can decide, whether these suggestions are reasonable or
not. However, this problem does not seem to be crucial. In our research, which
involved 40 use cases, we have found only three sentences in which this problem
occurred.
4.4 Defects in Use Cases and their Detection
Adolph [9] and Cockburn [31] presented a set of guidelines and good practices about
how to write effective use cases. In this context "effective" means clear, cohesive,
easy to understand and maintain. Reading the guidelines one can distinguish sev-
eral types of defects in use cases. In this Section we present those defects that can be
automatically detected. Each defect discussed in the paper contains a description
of automatic detection method. They have been split into three groups presented
in separate subsections: specification-level bad smells (those are defect indicators
4.4. Defects in Use Cases and their Detection 24
concerning a set of use cases), use-case level bad smells (defect indicators concern-
ing a single use case), and step-level bad smells (they concern a step - a use cases
consists of many steps).
4.4.1 Specification-Level Bad Smells
At the level of requirements specification, where there are many use cases, a quite
common defect which we have observed is use-case duplication. Surprisingly, we
have found such defects even in specifications prepared by quite established Polish
software houses. The most frequent is duplication by information object. If there
are two different information objects (e.g. an invoice and a bill), analysts have a ten-
dency to describe the processes which manipulate them as separate use cases, even
if they are processed in the same way. That leads to unnecessary thick documenta-
tion (the thicker the document, the longer the time necessary to read it). Moreover,
it is dangerous. When someone finds and fixes a defect in one of the use cases, the
other will remain unchanged, what can be a source of problems in the future. There
are two sources of duplicated use cases:
• Intentional duplication. An analyst prefers that style and/or he wants to have
a thick document (many customers still prefer thick documents - for them
they look more serious and dependable, which is of course a myth). Some of
such analysts perhaps will ignore that kind of warning, but some other - more
proactive - may start to change their style of writing specification.
• Unintentional duplication. There are several analysts, each of them is writ-
ing his own part of the specification and before the review process no one is
aware of the duplications. If this is the case, the ability to find duplicates in an
automatic way will be perceived as very attractive.
Detection method is two-phased. In the first stage a signature (finger print) of
each use case is computed. It can be a combination of a main actor identifier (e.g.
its number) and a number of steps a use case contains. Usually a number of steps
in a use case and a number of actors in the specification are rather small (far less
than 256), thus a signature can be a number of the integer type. If two use cases
have the same signature, they go through the second stage of analysis during which
they are examined step by step. Step number j in one use case is compared against
step number j in the second use case and so-called step similarity factor, s, is com-
puted. Those similarity factors are combined into use-case similarity factor, u. If u is
greater than a threshold then two use cases are considered similar and a duplication
warning is generated.
4.4. Defects in Use Cases and their Detection 25
A very simple implementation of the above strategy can be the following. We
know that two similar use cases can differ in an information object (a noun). More-
over, we assume that most important information about processing an information
object is contained in verbs and nouns. Thus, step similarity factor, si, for steps
number i in the two compared use cases can be computed in the following way:
• If all the corresponding verbs and nouns appearing in the two compared steps
are the same, then si = 1.
• If all the corresponding verbs are the same and all but one corresponding nouns
are the same and a difference in the two corresponding nouns has been ob-
served for the first time, then si = 1 and InfObject1 is set to one of the "con-
flicting" nouns and InfObject2 to the other (InfObject1 describes an informa-
tion object manipulated with the first use case and InfObject2 is manipulated
with the second use case).
• If all the corresponding verbs are the same and all but one corresponding nouns
are the same and the conflicting nouns are InfObject1 in the first use case and
InfObject2 in the second, then si = 1.
• In all other cases, si for the two analyzed steps is 0.
Use-case similarity factor, u, can be computed as a product of step similarity
factors: s1 ∗ s2...∗ sn .
The described detection method is oriented towards intentional duplication. To
make it effective in the case of unintentional duplication one would need a dictio-
nary of synonyms. Unfortunately, so far we do not have any.
4.4.2 Use-Case Level Bad Smells
Bad smells presented in this section are connected with the structure of a single use
case. The following bad smells have been selected to detect them automatically with
UC Workbench, a use-case management tool developed at the Poznan University of
Technology:
• Too long or too short use cases. It is strongly recommended [9] to keep use
cases 3-9 steps long. Too long use cases are difficult to read and understand.
Too short use cases, consisting of one or two steps, distract a reader from the
context and, as well, make the specification more difficult to understand. To
detect that bad smell it is enough to count the number of steps in each use
case.
• Complicated extension. An extension is designed to be used when an alterna-
tive course of action interrupts the main scenario. Such an exception usually
can be handed by a simple action and then it can come back to the main sce-
nario or finish the use case. When the interruption causes the execution of
4.4. Defects in Use Cases and their Detection 26
Figure 4.5: Example of a use case with too complicated extension.
Wrong:1. Administrator fills in his user name2. Administrator fills in his telephone number3. Administrator fills in his email
Correct:Administrator fills in his user name, telephone number and email
Figure 4.6: Example of repeated actions in neighboring steps
a repeatable, consistent sequence of steps, then this sequence should be ex-
tracted to a separate use case (Figure 4.5). Detection can be based on counting
steps within each extension. A warning is generated for each extension with
too many steps.
• Repeated actions in neighboring steps.
• Inconsistent style of naming
The last two bad smells will be described in the subsequent subsections.
Repeated Actions in Neighboring Steps
Every step of a use case should represent one particular action. The action may
consist of one or more moves which can be taken as an integrity. Every step should
contain significant information which rather reflect user intent then a single move.
Splitting these movements into separate steps may lead to long use cases, bother-
some to read and hard to maintain.
Detection method: Check whether several consecutive steps have the same ac-
tor (subject) and action (predicate). Extraction of subject and predicate from the
sentence is done by the Stanford parser. The analyst can be informed that such se-
quence of steps can be combined to a single step.
4.4. Defects in Use Cases and their Detection 27
Wrong: Title: Main Use CaseCorrect: Title: Buy a book
Figure 4.7: Example of inconsistent style of naming
Inconsistent Style of Naming
Every use case should have a descriptive name. The title of each use case presents a
goal that primary actor wants to achieve. There is a few conventions of naming use
cases, but it is preferable to use active verb phrase in the use case name. Further-
more, chosen convention should be used consistently in all use cases.
Detection method: Approximated method of bad smell detection in use case
names, is to check whether use case name satisfies the following constraints:
• The title contains a verb in infinitive (base) form
• The title contains an object
This can be done using the Stanford parser. If the title does not fulfill these con-
straints, a warning is attached to the name.
4.4.3 Step-Level Bad Smells
Bad smells presented in this section are connected with use-case steps. Steps occur
not only in the main scenario, but also in extensions. The following bad smells are
described here:
Too Complex Sentence Structure
The structure of a sentence used for describing each step of use case should be as
simple as possible. It means that it should generally consists of a subject, a verb, an
object and a prepositional phrase ([9]). With such a simple structure one can be sure
that the step is grammatically correct and unambiguous. Because of the simplicity,
use cases are easy to read and understand by readers. Such a simple structure helps
a lot when using natural language tools. It is essential to notice that the simpler the
sentence is, the more precisely other bad smells can be discovered. An example of a
too complex sentence and its corrected version is presented below.
Detection method: Looking at the use cases that were available to us we have
decided to build a simple heuristic that checks whether a step contains more than
one sentence, subject or predicate, coordinate clause, or subordinate clause. If the
answer is YES, then a warning is generated. Obviously, it is just a heuristic. It can
happen that a step contains one of the mentioned defect indicator, but it is still
4.4. Defects in Use Cases and their Detection 28
Wrong: The system switches to on-line mode and displays summary informationabout data that have to be uploaded and downloaded
Correct: The system displays list of user’s tasks
Figure 4.8: Examples of too complex sentence structure
Wrong: The form is filled inCorrect: Student fills in the form
Figure 4.9: Example of lack of the actor
readable and easy to understand. Providing a clear and correct answer in all possible
cases is impossible - even two humans can differ in their opinion what is simple
and readable. In our research we have encountered some examples of this problem.
However, we have distinguished some rules, which can be used to verify, whether
a sentence is too complex and unreadable. Below we present the verification rules.
The numbers in the brackets show applicability of a rule (in how many use cases
a given rule could be applied) and its effectiveness (in how many use cases it has
found a real defect approved by a human).
• step contains more than one sentence (2 / 2)
• step contains more than one subject or predicate (2 / 2)
• step contains more than one coordinate clause (8 / 4)
• step contains more than one subordinate clause (7 / 5)
Lack of the Actor
According to [9] it should be always clearly specified who performs the action. The
reader should know which step is performed by which actor. Thus, every step in a
use case should be an action that is performed by one particular actor. Actor’s name
ought to be the subject of the sentence. Thus, in most cases the name should appear
as the first or second word in the sentence. Omission of actor’s name from the step
may lead to a situation in which the reader does not know who is the executor of the
action.
Detection method: In the case of UC Workbench, every actor must be defined
and each definition is kept within the system. Therefore it can be easily verified
whether the subject of a sentence describing a step is defined as an actor.
Misusing Tenses and Verb Forms
A frequent mistake is to write use cases from the system point of view. This is easier
for the analyst to write in such a manner, but the customer may be confused during
4.4. Defects in Use Cases and their Detection 29
Wrong: Send the messageWrong: System is sending an emaiWrong: The email is sent by SystemCorrect: System sends an email
Figure 4.10: Example of misusing tenses and verb forms
Wrong: User chooses the second tab and marks the checkboxesCorrect: User chooses appropriate options
Figure 4.11: Example of using technical jargon
reading. Use cases should be written in a way which is highly readable for everyone.
Therefore the action ought to be described from the user point of view. In order to
ensure this approach, the present simple tense and active form of a verb should be
used. Moreover, the present simple tense imply that described action is constant
and the system should always respond in a determined way.
Detection method: Using combined structure from the Stanford parser, the nsubj
relation [38] can be determined. It can be checked if the subject is an actor and the
verb is in the third person singular active form.
Using Technical Jargon
Use case is a way of describing essential system behavior, so it should focus on the
functional requirements. Technical details should be kept outside of the functional
requirements specification. Using technical terminology might guide developers to-
wards specific design decisions. Graphical user interface (GUI) details clutter the
story, make reading more difficult and the requirements more brittle.
Detection method: We should create a dictionary containing terminology typi-
cal for specific technologies and user interface (e.g. button, web page, database, edit
box). Then it is easy to check whether the step description contains them or not.
Conditional Steps
A sequence of actions in the use case depends on some conditions. It is natural to
describe such situation using conditionals "if condition then action ...". This ap-
proach is preferred by computer scientists, but it can confuse the customer. Espe-
cially it can be difficult to read when nested "if" statement is used in a use case
step. Use cases should be as readable as possible. Such a style of writing makes it
complex, hard to understand and follow.
4.5. Case Study 30
Wrong:1. Administrator types in his user name and password2. System checks if the user name and the password are correct3. If the given data is correct the system logs Administrator in
Correct:1. Administrator types in his user name and password2. System finds that the typed-in data are correct and logs Administrator in
Extensions:2.A. The typed-in data are incorrect2.A.1. System presents an error message
Figure 4.12: Example of conditional steps
It is preferable to use the optimistic scenario first (main scenario) and to write
alternative paths separately in the extension section.
Detection method: The easiest way to detect this bad smell is to look for specific
keywords, such as if, whether, when. Additionally, using the Stanford parser it can
be checked that the found keyword is a subordinating conjunction. In such a case
the analyst can be notified that the step should be corrected.
4.5 Case Study
In this section we would like to present an example of applying our method to a set
of real use cases. These use cases were written by 4th year students participating in
the Software Development Studio (SDS) which is a part of the Master Degree Pro-
gram in Software Engineering at the Poznan University of Technology. SDS gives the
students a chance to apply their knowledge to real-life projects during which they
develop software for real customers.
Let us consider a use case presented in Figure 4.13. Using the methods presented
in Section 4 the following bad smells can be detected:
Misusing Tenses and Verb Forms
• Step: 3. User fill the form
• Tagging (from the Stanford parser): User/NNP fill/VBP the/DT form/NN
NNP - singular proper noun
VBP - base form of auxiliary verb
DT - determiner
NN - singular common noun
• Typed dependencies (from the Stanford parser):
nsubj(fill-2, User-1) - nominal subject
4.5. Case Study 31
Figure 4.13: A use case describing how to log in to the TMS Mobile system.
det(form-4, the-3) - determiner
dobj(fill-2, form-4) - direct object
• Conclusion:
From the typed dependencies we can determine the nsubj relation between
User and fill. From the tagging, it can be observed that fill is used in wrong
form (proper form would be VBZ - verb in third person singular form).
Conditional Step
• Step: 4. The system checks if user entered proper data and loges him in
• Tagging (from the Stanford parser): The/DT system/NN checks/VBZ if /IN user/NN
entered/VBD proper/JJ data/NNS and/CC loges/VBZ him/PRP in/IN
VBZ - verb in third person singular form
IN - subordinating conjunction
VBD - verb in past tense
JJ - adjective
NNS - plural common noun
PRP - personal pronoun
• Combined structure (from the Stanford parser): Presented in Figure 4.14
• Conclusion:
As it can be observed the step contains the word if. Moreover from the com-
bined structure we can conclude that the word if is subordinating conjunc-
tion.
Complicated Extensions
• Extension: 4.b User enters login data for the first time
• Symptom: The extension contains three steps.
• Conclusion: The extension scenario should be extracted to a separate use
case.
4.6. Conclusions 32
Figure 4.14: Example of a use case that contains a condition.
4.6 Conclusions
So far about 40 use cases have been examined using our methods of detecting bad
smells. Almost every use case from the examined set, contained a bad smell. Most
common bad smells were: Conditional Step, Misusing Tenses and Verb Forms and
Lack of the Actor. Thus, this type of research can contribute to higher quality of
requirements specification.
In the future it is planned to extend the presented approach to other languages,
especially to Polish which is mother tongue to the authors. Unfortunately, Polish is
much more difficult for automated processing and there is lack of appropriate tools
for advanced analysis.
Acknowledgements
First of all we would like to thank the people involved in the UC Workbench project.
This work has been financially supported by the State Committee for Scientific
Research as a research grant 91-0269 (years 2006-2008).
Chapter 5
Use-case-based benchmark
specification
Preface
This chapter contains the paper: Bartosz Alchimowicz, Jakub Jurkiewicz, Mirosław
Ochodek, Jerzy Nawrocki, Building Benchmarks for Use Cases, Computing and Infor-
matics, Vol 29, No 1 (2010), pp. 27-44.
My contribution to this chapter included the following tasks: (1) full qualitative
analysis of a set of 524 real-life use cases; (2) part of the quantitative analysis of the
set of use cases; (3) introducing changes to the benchmark specification according
to the results of the qualitative analysis; (4) performing a case study with the tool for
automatic detection of bad smells described in the previous chapter.
Context: Many concepts in requirements engineering require empirical evaluation.
Such an evaluation should satisfy the following requirements: (1) evaluation exper-
iments should be replicable, and (2) they should be based on representative data
or artifacts, coming from real-life projects. In the context of use cases the prob-
lem is that real-life software specifications are of commercial value and usually they
cannot be published, which makes such an experiment non-replicable at another
evaluation site.
Objective: The goal of the research presented in this chapter was to create a bench-
mark software specification that could be used in replicable evaluation experiments
concerning methods and tools oriented toward use cases.
Method: To create the benchmark specification, 524 use cases from 16 industrial
and academic projects were analysed. The analysis was twofold: (1) quantitative
analysis focused on measuring simple attributes of use cases (e.g. number of steps
33
5.1. Introduction 34
in main scenarios, steps with validation actions); (2) qualitative analysis focused on
measuring the frequency of bad smells in the available use cases.
Results: Based on the results of the quantitative and qualitative analysis, a profile
of a typical use-case-based specification was created. Moreover, using this profile,
two specifications resembling the typical specification were composed: (1) a clean
specification without bad smells; (2) a specification contaminated with bad smells.
Conclusion: The proposed benchmark specification can be used in experiments
concerning methods and tools for use case analysis. It has been used to evaluate
simple methods of quality assurance for use cases (presented in Chapter 4), and to
study the effectiveness of events identification performed by a human (Chapter 6) or
with the help of an automatic tool (Chapter 7). Moreover, it has already been used
by other researchers (e.g. by Rehan Rauf et al. from University of Waterloo in their
work presented at the IEEE International Conference on Requirements Engineering
RE-2011).
5.1 Introduction
Functional requirements are very important in software development. They im-
pact not only the product itself but also test cases, cost estimates, delivery date,
and user manual. One of the forms of functional requirements are use cases in-
troduced by Ivar Jacobson about 20 years ago. They are getting more and more pop-
ular in software industry and they are also a subject of intensive research (see e.g.
[18, 21, 39, 41, 110]). Ideas presented by researches must be empirically verified (in
best case, using industrial data) and it should be possible to replicate a given exper-
iment by any other researcher [108]. In case of use cases research it means that one
should be able to publish not only his/her results but also the functional require-
ments (use cases) that has been used during the experiment. Unfortunately, that is
very seldom done because it is very difficult. If a given requirements specification
has a real commercial value, it is hard to convince its owner (company) to publish it.
Thus, to make experiments concerning use cases replicable, one needs a benchmark
that would represent use cases used in real software projects.
In the paper an approach to construct use-cases benchmark is presented. The
benchmark itself is a set of use-cases-based requirements specifications, which have
a typical profile observed in requirements coming from real projects. It includes
both quantitative and qualitative aspects of use cases. To derive such a profile an
extensive analysis of 524 use cases was performed.
The paper is organised as follows. In Section 5.2 a model of use-cases-based re-
5.2. Benchmark-oriented model of use-case-based specification 35
quirements specification is presented. This model is further used in sections 5.3 and
5.4, which present the analysis of the use cases, coming from 16 projects. Based on
the analysis quantitative and qualitative profile of the typical use-case-based spec-
ification is derived, which is used to create a benchmark specifications in Section
5.5. Finally, a case study is presented in Section 5.6, which demonstrates a typical
usage of the benchmark specification to compare tools for the use-case analysis.
5.2 Benchmark-oriented model of use-case-based
specification
Although use cases have been successfully used in many industrial projects (Neil et
al. [92] reported that 50% of projects have their functional requirements presented
in that form), they have never been a subject of any recognisable standardisation.
Moreover, since their introduction by Ivar Jacobson [61] many approaches for their
development were proposed. For example Russell R. Hurlbut [53], gathered fifty-
seven contributions concerning use cases modeling. Although, all approaches share
the same idea of presenting actor’s interaction with the system in order to obtain
his goal, they vary in a level of formalism and presentation form. Thus it is very
important to state what is understood by the term use-cases-based requirements
specification.
To mitigate this problem a semi-formal model of use-cases-based requirements
specification has been proposed and presented in Figure 5.1. It incorporates most
of the best-practices presented by Adolph et al. [9] and Cockburn [31].
However it still allows to create specification that contains only scenario (or non-
structured story) and actors, which is enough to compose small but valuable use-
case. This specification can be further extended with other elements like extensions
(using steps or stories), pre- and post-conditions, notes or triggers.
Unfortunately, understanding the structure of specification is still not enough
in order to construct a typical use-cases-based specification. What is missing, is
the quantifiable data concerning the number of model-elements and proportions
between them1. In other words, one meta-question has to be asked for each element
of the model - "how many?".
Another aspect is a content of use cases. In this case what is required is a knowl-
edge about the language structures that people use to author use cases. This in-
cludes also a list of typically-made mistakes.
1A number/proportion of any element of the model will be further addressed as a property of thespecification
5.3. Analysis of uc-based specifications structure 36
Requirements Specification
+ Name+ Description
Business Object
+ Name+ Description
Actor
Referenced Element
Scenario
+ TextStep
+ TextStory
+ Event_TextExtension
+ TitleUse Case
Goal Level
+ DescriptionConditon
+ DescriptionTrigger
Business User Sub-function
*
1
*
1
*
1
* *
*1
1..*
1
*1
0..11
1
1
1
*
*
*
*
*+sub-scenario
+pre-conditions
+main-scenario+post-conditions
+main-actors+secondary-actors
Business Rule *
1*
*
*
1..*
*
1
0..1
1
Figure 5.1: Use-Cases-based functional requirements specification model
5.3 Analysis of uc-based specifications structure
In order to discover the profile of a typical specification, data from various projects
was collected and analysed. As a result a database called UCDB (Use Cases Database)
[6] has been created. At this stage it stores data from 16 projects with a total number
of 524 use cases (basic characteristics of requirements specifications as well as short
description of projects are presented in Table 5.1).
Specifications have been analysed in order to populate the model with the in-
formation concerning average number of its elements and additional qualitative as-
pects of use cases (see Section 5.4).
One of the most interesting findings is that 65.8% of the main scenarios in use
cases consist of 3-9 steps, which means that they fulfil the guidelines proposed by
Cockburn [31]. What is more 72.1% of the analysed use cases have alternative sce-
narios (extensions). There are projects in which use cases have extensions attached
to all steps in main scenarios, however on-average use case contains between 1 to
2 extensions (mean 1.57). Detailed information regarding the number and distribu-
tions of steps in main scenarios and extensions, is presented in Figure 5.2.
Another interesting observation, concerning the structure of use cases, is that
5.3. Analysis of uc-based specifications structure 37
Table 5.1: Overview of the projects and their requirements specifications (origin: industry -project developed by software development company, s2b - project developed by students for ex-ternal organisation, external - specification obtained from the external source which is freely ac-cessible through the Internet; projects D and K come from the same organisation)
ID
All
Business
Project A English S2B 17 0% 76% 24%
Project B English S2B 37 19% 46% 35%
Project C English 39 18% 44% 33%
Project D 77 0% 96% 4%
Project E S2B 41 0% 100% 0%
Project F 10 0% 100% 0%
Project G English 90 0% 81% 19%
Project H 16 19% 56% 25%
Project I 21 38% 57% 5% Banking system
Project J 9 0% 67% 33%
Project K 75 0% 97% 3%
Project L English 16 0% 31% 69%
Project M English 26 0% 23% 77%
Project M English 18 0% 0% 100%
Project O English 16 0% 31% 69%
Project P English 16 0% 25% 75%
Specification
language
Origin
Number of use cases
Description
User
Sub-
function
Web & standalone application for managing members of organization
Web-based Customer Relationship Management (CRM) system
ExternalUK Collaboration for a Digital Repository
(UKCDR)
Polish IndustryWeb-based e-government Content Management System (CMS)
PolishWeb-based Document Management
System (DMS)
Polish IndustryWeb-based invoices repository for remote
accounting
ExternalProtein Information Management System
(PIMS)
Polish IndustryIntegration of two sub-system s in ERP
scale system
Polish Industry
Polish IndustrySingle functional module for the web-
based e-commerce solution
Polish IndustryWeb-based workflow system with Content Management System (CMS)
ExternalPolaris - Mission Data System (MDS)
process demonstration
ExternalVesmark Smartware™ - Financial
decission system
External Photo Mofo - Digital images management
External iConf - Java based conference application
ExternalOne Laptop Per Child - Web-based
Content Management
the sequences of steps performed by the same actor are frequently longer then one.
What is more interesting this tendency is more visible in case of main actor steps
(38.4% of main actor steps sequences are longer than one step, and only 25.5% in
case of secondary actor - in most cases system being developed). One of potential
reasons is that the actions performed by main actor are more important, from the
business point of view. This is contradict to the concept of transaction presented by
Ivar Jacobson [60]. Jacobson enumerated four types of actions which form together
the use-case transaction. Only one of them belongs to a main actor (user request
action). The rest of them are system actions: validation, internal state change, and
response. It might look that those actions should frequently form longer sequences.
However, 74.6% of steps sequences performed by system consisted of a single step.
The distributions and number of steps in main actor sequences are presented in
5.3. Analysis of uc-based specifications structure 38
Number of steps in main scenario
Density
1 3 5 7 9 11 13 15
0.00
0.10
0.20
a)
Number of steps in extension
Density
1 3 5 7 9 11 13 15
0.0
0.1
0.2
0.3
0.4 c)
A B C D E F G H I J K L M O P
510
15
Project
Num
ber o
f ste
ps in
mai
n sc
enar
io
b)
A B C D F G H I J K N O P
24
68
Project
Num
ber o
f ste
ps in
ext
ensio
n
d)
Figure 5.2: Scenarios lengths in analysed use cases a) histogram presents the number of steps inmain scenario (data aggregated from all projects), b) box plot presents the number of steps in mainscenario (in each project), c) histogram presents the number of steps in extension (data aggregatedfrom all projects), d) box plot presents the number of steps in extension (in each project, note thatproject E was excluded because it has all alternative scenarios written as stories)
Figure 5.3.
If we look deeper into the textual representation of the use cases, some interest-
ing observation concerning their structure might be made. One of them regards the
way authors of use cases describe validations actions. Two different approaches are
observed. The most common is to use extensions to represent alternative system
behaviour in case of failure of the verification process (41.3% of extensions have this
kind of nature). Second one, and less frequent, is to incorporate validation actions
into steps (e.g. System verifies data). This kind of actions are observed only in 3.4%
of steps.
Analysed properties can be classified into two separate classes. The first one
contains properties which are observed in nearly all of the projects, with compara-
ble intensity. Those kind of properties are seem to be independent from the spec-
ification they belong to (from its author’s writing style). The second class is an op-
posite one, and includes properties which are either characteristic only for a certain
set of use cases or their occurrence in different specifications is in between of two
extremes - full presence or marginal/none. Such properties are project-dependent.
5.4. Analysis of use-cases-based specifications defects 39
2.8%3.8%10.2%
61.6%
21.6%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 2 3 4 >4
Main actor's steps sequence length
De
nsi
ty
0.8%1.6%
18.8%
4.3%
74.6%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 2 3 4 >4Secondary actor's steps
sequence length
De
nsi
ty
a) b)
Figure 5.3: Distributions of actors steps-sequences lengths in analysed use cases, a) histogrampresents the length of the steps sequences performed by main actor, b) histogram presents length ofthe steps sequences performed by secondary actor - e.g. System
More detailed description of analysed requirements specifications is presented in
Table 5.2.
5.4 Analysis of use-cases-based specifications defects
Use cases seem simple in their structure and easy to understand, however it is easy
to introduce defects, which might influance the communiation between analyst,
customer and project stakeholders. Different research [7, 93] indicated that vague,
inconsistent or incomplete requirements are the main reasons for projects failures.
Defects make specification cumbersome, tedious and hard to read, analyse and change,
hence they can lead to misunderstandings and finally to higher costs of software
projects. Based on patterns and guidelines presented by Adolph et al. and Cock-
burn [9, 31] a set of defect-types has been distinguished. Projects stored in UCDB
have been analysed in order to collect statistical data about defects found in the re-
quirements specification. Table 5.3 presents the results of this analysis.
One of the main observations is that almost 86% of the analysed use cases con-
tain at least one defect, which proves that it is not easy to provide requirements
specification of good quality. For researchers, who try to analyse use case quality
the knowledge of average defects density can be used to tune their methods and
tools in order to make them more effective and efficient.
Large number of different defects can be found in most of the projects. For ex-
ample steps without explicitly specified actor can be found in 13 projects, 11 projects
5.4. Analysis of use-cases-based specifications defects 40
Table 5.2: Results of the structure analysis of the use cases stored in UCDB
Overall
Project
AB
CD
EF
GH
IJ
KL
MN
OP
524
17
37
39
77
41
10
90
16
21
975
16
26
18
16
16
4.82
4.76
4.92
2.95
4.34
5.78
3.90
5.10
3.38
4.33
3.67
6.36
3.44
6.58
5.22
2.88
3.63
SD
2.41
0.97
1.46
2.69
2.25
1.26
1.20
2.41
0.50
2.01
1.32
3.25
1.67
1.50
1.40
1.20
1.50
72.1%
94.1%
51.4%
41.0%
55.8%
92.7%
100%
68.9%
81.3%
90.5%
55.6%
98.7%
75.0%
100%
38.9%
50.0%
62.5%
1.57
1.29
0.57
0.95
0.92
1.63
1.00
1.28
2.94
2.33
1.00
2.69
1.75
4.46
0.39
0.63
0.88
SD
1.88
0.77
0.60
1.72
1.12
0.62
0.00
1.30
2.98
1.58
1.58
2.91
1.57
1.61
0.50
0.81
0.81
2.46
1.80
2.52
1.00
1.45
N/A
1.10
2.77
2.30
1.64
1.44
3.09
N/A
N/A
2.67
1.30
1.36
SD
1.61
0.84
0.87
0.00
0.59
N/A
0.32
1.87
0.65
0.67
0.53
1.80
N/A
N/A
2.89
0.48
0.50
3.4%
3.7%
5.5%
11.3%
1.8%
0.0%
0.0%
0.0%
27.8%
17.6%
3.0%
0.0%
0.0%
15.2%
0.0%
2.2%
3.4%
41.3%
9.1%
76.2%
64.9%
63.4%
40.3%
100%
50.4%
78.7%
89.8%
100%
15.3%
60.7%
21.6%
0.0%
33%
50.0%
161.6%
42.3%
93.1%
88.6%
37.0%
61.9%
87.5%
91.7%
47.4%
52.4%
50.0%
52.6%
45.5%
67.4%
52.8%
71.4%
45.0%
221.6%
23.1%
3.5%
6.8%
34.8%
36.1%
12.5%
0.0%
0.0%
19.1%
42.9%
16.4%
27.3%
32.7%
33.3%
14.3%
10.0%
310.2%
26.9%
3.5%
4.6%
16.3%
0.0%
0.0%
8.3%
31.6%
23.8%
7.1%
12.9%
22.7%
0.0%
11.1%
14.3%
30.0%
43.8%
7.7%
0.0%
0.0%
6.5%
0.0%
0.0%
0.0%
21.1%
0.0%
0.0%
8.6%
4.6%
0.0%
2.8%
0.0%
15.0%
>4
2.8%
0.0%
0.0%
0.0%
5.4%
2.1%
0.0%
0.0%
0.0%
4.8%
0.0%
9.5%
0.0%
0.0%
0.0%
0.0%
0.0%
174.6%
82.6%
96.3%
72.2%
67.9%
100%
100%
90.0%
50.0%
12.5%
62.5%
72.1%
100.0%
0.0%
0.0%
60.0%
10.0%
218.8%
17.4%
3.7%
22.2%
29.6%
0.0%
0.0%
8.0%
16.7%
43.8%
37.5%
14.8%
0.0%
83.3%
68.4%
40.0%
0.0%
34.3%
0.0%
0.0%
0.0%
0.0%
0.0%
0.0%
2.0%
33.3%
25.0%
0.0%
7.4%
0.0%
16.7%
26.3%
0.0%
0.0%
41.6%
0.0%
0.0%
5.6%
2.5%
0.0%
0.0%
0.0%
0.0%
0.0%
0.0%
4.1%
0.0%
0.0%
5.3%
0.0%
0.0%
>4
0.8%
0.0%
0.0%
0.0%
0.0%
0.0%
0.0%
0.0%
0.0%
18.8%
0.0%
1.6%
0.0%
0.0%
0.0%
0.0%
0.0%
38.4%
0.0%
0.0%
100%
0.0%
0.0%
0.0%
100%
56.3%
100%
0.0%
6.7%
100%
96.3%
0.0%
0.0%
6.3%
12.4%
0.0%
0.0%
0.0%
32.5%
0.0%
0.0%
0.0%
0.0%
0.0%
0.0%
44.2%
0.0%
0.0%
33.3%
0.0%
0.0%
1.92
0.00
0.00
0.00
3.20
0.00
0.00
0.00
0.00
0.00
0.00
1.09
0.00
0.00
1.33
0.00
0.00
SD
1.62
N/A
N/A
N/A
2.02
N/A
N/A
N/A
N/A
N/A
N/A
0.29
N/A
N/A
0.52
N/A
N/A
37.4%
82.4%
100%
0.0%
0.0%
97.6%
0.0%
0.0%
0.0%
23.8%
77.8%
0.0%
81.3%
38.5%
0.0%
0.0%
56.3%
14.3%
0.0%
0.0%
97.4%
0.0%
0.0%
0.0%
0.0%
0.0%
0.0%
0.0%
0.0%
100%
100%
0.0%
0.0%
12.5%
33.0%
100%
100%
100%
0.0%
100%
0.0%
0.0%
68.8%
57.1%
0.0%
0.0%
0.0%
0.0%
0.0%
100%
0.0%
6.4%
0.0%
0.5%
0.0%
0.0%
0.0%
25.6%
14.2%
0.0%
0.0%
6.1%
33.3%
0.0%
0.6%
0.0%
0.0%
0.0%
66.8%
27.3%
100%
8.1%
87.3%
0.0%
100%
100%
100%
100%
100%
100%
0.0%
0.0%
42.9%100.0%
100%
33.2%
72.7%
0.0%
91.9%
12.7%
100%
0.0%
0.0%
0.0%
0.0%
0.0%
0.0%
100%
100%
57.1%
0.0%
0.0%
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A
Property
Requirements specification independent
Number of use cases
Number of steps in main scenario
Mean
Use cases with extensions
Number of extensions in use case
Mean
Number of steps in extension
Mean
Steps with validation actions
Extensions which are validations
Main actor's steps sequence length
in main scenario
Secondary actor's steps sequence
length in main scenario
Requirements specification dependent
Use cases with additional description
Number of use cases with sub-scenario
Number of steps in sub-scenario
Mean
Use cases with pre-conditions
Use cases with post-conditions
Use cases with triggers
Number of steps with reference to use cases
Number of extensions with scenario
Number of extensions with stories
Explicitly defined Business Rules
5.4. Analysis of use-cases-based specifications defects 41
Table 5.3: Results of the quality analysis of the use cases stored in UCDB
Overall
Project
AB
CD
EF
GH
IJ
KL
MN
OP
2.55%
0.00%
0.00%
0.00%
0.00%26.83%0.00%
2.22%
0.00%
0.00%
0.00%
0.00%
6.25%
0.00%
0.00%
0.00%
0.00%
56.25%
YY
YN
YN
NY
YN
NN
NY
YY
4.76%11.76%0.00%51.28%0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%31.25%6.25%
4.25%
4.55%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
8.51%
0.00%
0.00%14.85%0.00%
0.00%
0.00%
0.00%
0.00%
35.80%77.27%0.00%86.49%6.49%
100%
0.00%
5.22%14.89%8.16%
0.00%
0.00%
100%
100%
0.00%
100%
0.00%
4.13%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
8.51%
8.16%
0.00%12.87%0.00%
0.00%
0.00%
0.00%
0.00%
11.39%0.00%
0.00%51.28%11.69%0.00%20.00%15.56%0.00%
9.52%22.22%0.00%37.50%7.69%
0.00%37.50%25.00%
6.80%
0.00%
0.00%
5.13%
7.79%
0.00%
0.00%
5.56%
0.00%
0.00%
0.00%36.00%0.00%
0.00%
0.00%
0.00%
0.00%
22.11%11.76%10.81%56.41%48.05%9.76%
0.00%
0.00%50.00%28.57%0.00%12.00%37.50%3.85%44.44%87.50%56.25%
1.53%
0.00%
0.00%
0.00%10.39%0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
1.33%
0.00%19.23%0.00%
0.00%
0.00%
0.80%
1.11%
0.85%
5.08%
0.00%
0.00%
0.00%
0.00%
0.00%
5.95%
0.00%
0.18%
0.00%
0.00%
6.86%
0.00%
5.17%
4.56%
0.00%
0.42%
0.00%
0.00%
0.00%
0.00%12.87%0.00%
0.00%
2.17%
0.36%
1.82%
0.00%47.06%16.67%1.30%
0.29%
0.00%
0.00%
0.85%
0.24%
0.00%
0.00%
0.00%
0.00%
1.19%
0.00%
0.09%
0.00%
0.00%
5.88%
0.00%
0.00%
19.37%0.00%
1.69%38.14%0.94%
0.84%
0.00%45.05%0.68%
0.60%
0.00%
0.54%
0.00%
0.58%13.73%51.67%20.78%
3.63%
1.11%
0.00%55.93%0.24%
0.84%
0.00%
2.70%
0.00%
0.00%
0.00%
0.27%
0.00%
0.00%13.73%31.67%16.88%
2.51%
0.00%
5.08%27.12%0.71%
0.00%
0.00%
1.16%
0.68%
2.38%
2.17%
1.45%
1.82%
2.92%
9.80%
1.67%
2.60%
3.79%
0.00%16.95%11.02%0.00%
0.00%
4.00%
0.77%
0.00%
0.00%
4.35%
4.09%
1.82%12.28%6.86%11.67%2.60%
3.71%
4.44%
0.85%
3.39%
3.30%
0.00%
0.00%11.45%0.00%
0.00%
8.70%
0.91%
0.00%
7.60%
0.98%
3.33%
0.00%
1.22%
0.00%
0.00%
4.24%
0.71%
0.00%
0.00%
0.00%
0.00%
0.00%
0.00%
3.00%
0.00%
0.58%
3.92%
1.67%
0.00%
3.16%
0.00%47.03%0.00%
0.00%
0.00%
0.00%
0.00%
2.05%
0.60%
0.00%
0.00%
0.00%
0.00%
2.94%
0.00%
5.19%
Property
Use-case set
Level
Use case duplicates (the same actions
operating on different business objects)
Lack of hierarchical structure (Y/N)
Use-case Level
Number of use cases, in which name
doesn't describe the goal
Number of non-detectable condition in
extensions
Number of extensions without
scenarios
Number of extensons with nested
scenarios
Scenario Level
Number of use cases with less then 3
steps in main scenario
Number of use cases with more then 9
steps in main scenario
Lack of interactions between actors
Number of use cases with at least
double nested sub-scenarios
Step level (steps from main scenario and from extensions)
Number of steps with different tenses
used
Number of steps in which user
interface terms are used
Number of steps in which technical
terms are used
Number of steps in which no actor is
specified
Number of steps in which passive
voice for the action is used
Number of steps with complex
sentence structure or with more then
one sentence
Number of steps with possibility of
different interpretations
Number of steps with details of
elements of business objects
Number of steps with conditional
clauses
Number of steps with language
mistakes
5.5. Building referential specification 42
have some extensions missing alternative scenarios, the same number of projects
have use cases with less then 3 steps. Some of the defects are again unique only to
some specifications. For instance langauge mistakes were found mainly in project
B (over 47% of steps have language mistakes), while most of the projects did not
have such problems. Scenarios longer than 9 steps is another example of a project-
specific defect (it mainly relates to Project K, which has over 30% of use cases with
this defect). The simple explenation for this is that different authors have different
writing styles and probably they are not aware of some problems and defects.
Another interesting finding is that one defect can lead to another defect. The
analysis shows that steps without actor tend to written using passive voice (projects
C, N, O, P). A similar situation can be observed in case of language mistakes and
complex-steps structure which often lead to different possible interpretations (projects
B, C, N).
5.5 Building referential specification
Combining use-cases model and average values coming from the analysis, a profile
of the typical use-cases-based requirements specification can be derived. In such
specification all properties would appear with the typical intensity. In other words,
analysis performed on such document could be perceived as an every-day task for
use-cases analysis tools.
There might be at least two approaches to acquire instance of such specification.
The first one would be to search for the industrial set of use cases, which would fulfil
given criteria. Unfortunately, most of industrial documents are confidential, there-
fore they could not be used at large scale. The second, obvious approach would
be to develop such specification from scratch. If it is built according to the ob-
tained typical-specification profile, it might be used instead of industrial specifica-
tion. Since it would describe some abstract system it can be freely distributed and
used by anyone.
Therefore, we would like to propose approach to develop an instance of the ref-
erential specification for the benchmarking purpose. The document is available at
the web site [6] and might be freely used for further research.
5.5.1 Specification domain and structure
It seems that large number of tools for the use-cases analysis have their roots in the
research community. Therefore the domain of the developed specification should
be easy to understand especially for the academia staff. One of the well-recognised
5.5. Building referential specification 43
processes at university is a students admission process. Although developed speci-
fication will describe hypothetical process and tools, it should be easy to understand
for anyone who have ever participated in a similar process.
Another important decision concerns the structure of the specification (number
of actors, use cases and business objects). Unfortunately the decision is rather ar-
bitral, because number of use cases, actors and business objects may depend on
the size of the system and level of details incorporated into its description. In this
case a median number of use cases and actors from the UCDB set has been used
as a number of use cases and actors in the constructed specification. The first ver-
sion of specification consisted of 7 actors (Administrator, Candidate, Bank, Selection
committee, Students Management System, System, User), 37 use cases (3 business,
33 user, and 1 sub-function level), and 10 business objects. After extending UCDB
with data from additional 5 projects the median number of use cases decreased to
20. Therefore we decided to prepare two versions of the specification. The larger
one has 37 use cases and is labeled as FULL, while in case of smaller specification
(SMALL) number of use cases was reduced to 20.
5.5.2 Use cases structure
A breadth-first rule has been followed in order to construct referential specification.
Firstly, all use cases were titled and augmented with descriptions. Secondly, all of
them were filled with main scenarios and corresponding extensions. During that
process all changes made in specification were recorded in order to check confor-
mance with the typical profile. This iterative approach was used until the final ver-
sions of both SMALL and FULL quantitative specifications were constructed. Their
profiles are presented in Table 5.4, in comparison to the corresponding average val-
ues from the UCDB projects analysis The most important properties are those which
are observed in all of the specifications (specification independent). In case of those
metrics most values in the referential document are very close to those derived from
the analysis. This situation differs in case of properties which were characteristic
only for certain projects (or variability between projects were very high). If the con-
structed specification is suppose to be a typical one, it should not be biased by fea-
tures observed only in some of the industrial specifications. Therefore dependent
properties were also incorporated into the referential use cases, however they were
not a subject of the tunning process.
In addition based on each of the quantitative specifications a corresponding
qualitative specification was proposed. In this case we focused on introducing changes
which led to obtaining a qualitative profile close to the typical profile.
5.5. Building referential specification 44
Table 5.4: Referential specifications profiles in comparison to the average profile derived fromthe analysis of the use cases stored in UCDB
4.80 4.81 4.82 - -
SD 1.21 1.50 2.41 - -
75.0% 70.3% 72.1% - -
1.60 1.50 1.57 - -
SD 0.71 0.69 1.88 - -
2.29 2.49 2.46 - -
SD 1.81 2.12 1.61 - -
4.2% 2.3% 3.4% - -
62.5% 46.2% 41.3% - -
1 64.5% 61.7% 61.6% - -
2 19.4% 21.7% 21.6% - -
3 12.9% 10.0% 10.2% - -
4 3.2% 3.3% 3.8% - -
>4 0.0% 3.3% 2.8% - -
1 73.5% 79.7% 74.6% - -
2 23.5% 15.3% 18.8% - -
3 2.9% 3.4% 4.3% - -
4 0.0% 1.7% 1.6% - -
>4 0.0% 0.0% 0.8% - -
- -2.55% 5.00% 2.70%
- -4.76% 5.00% 5.41%
- - 4.25% 4.17% 5.13%
- - 35.80% 33.33% 33.33%
- - 4.13% 4,17%% 5.13%
- -11.39% 10.00% 10.81%
- -6.80% 5.00% 5.41%
- - 22.11% 25.00% 24.32%
- -1.53% 0.00% 2.70%
- - 0.80% 0.78% 0.80%
- -4.56% 4.59% 4.78%
- - 0.29% 0.00% 0.40%
- - 19.37% 19.53% 19.12%
- -3.63% 3.91% 3.59%
- -2.51% 2.34% 2.39%
- -3.79% 3.91% 3.98%
- -3.71% 3.91% 3.59%
- - 1.22% 1.56% 1.59%
3.16% 3.13% 3.19%
Property
Quantitative Referential Specification Admission System 2.0 SMALL
Quantitative Referential Specification Admission System 2.0 FULL
Observed in Use-Cases Database
Qualitative Referential Specification Admission System 2.0 SMALL
Qualitative Referential Specification Admission System 2.0 FULL
Number of steps in main scenarioMean
Use Cases with extensions
Number of extensions in use caseMean
Number of steps in extensionMean
Steps of validation nature
Extensions of validation nature
Main actor's steps sequence length in main scenario
Secondary actor's steps sequence length in main scenario
Use case duplicates (the same actions operating on different business objects)
Number of use cases, in which name does not describe the goal
Number of non-detectable condition in extensions
Number of extensions without scenarios
Number of extensons with nested scenarios
Number of use cases with less then 3 steps in main scenario
Number of use cases with more then 9 steps in main scenario
Lack of interactions between actors
Number of use cases with at least double nested sub-scenarios
Number of steps with different tenses used
Number of steps in which user interface terms are used
Number of steps in which technical terms are used
Number of steps in which no actor is specified
Number of steps in which passive voice for the action is used
Number of steps with complex sentence structure or with more then one sentence
Number of steps with possibility of different interpretations
Number of steps with details of elements of business objects
Number of steps with conditional clauses
Number of steps with language mistakes
5.6. Benchmarking use cases - case study 45
5.6 Benchmarking use cases - case study
Having the example of the typical requirements specification, one can wonder how
it could be used. Firstly, researchers who construct methods and tools in order to
analyse use cases often face the problem of evaluating their ideas. Using some spec-
ification coming from the industrial project is a typical approach, however, this can-
not lead to the conclusion that the given solution would give the same results for
other specifications. The situation looks different when the typical specification is
considered, then researchers can assume that their tool would work for most of the
industrial specifications. Secondly, analysts who want to use some tools to analyse
their requirements can have a problem to choose the best tool that would meet their
needs. With the typical specification in mind it can be easier to evaluate the avail-
able solutions and choose the most suitable one. To conclude the example of the
typical requirements specification can be used:
• to compare two specifications and asses how the given specification is similar
to the typical one
• to asses the time required for a tool to analyse requirement specification
• to asses the quality of a tool/method
• to compare tools and choose the best one
In order to demonstrate the usage of the constructed specification we have con-
ducted a case study using the Qualitative Reference Specification Admission System
2.0 FULL. As a tool for managing requirements in a form of use cases we chose the
UC Workbench tool [87], which has been developed at Poznan University of Tech-
nology since the year 2005. Additionally we used the set of tools for defects detection
[30] to analyse the requirements. The aim of the mentioned tools is to find potential
defects and present them to analyst during the development of the requirements
specification. The tools are based on the Natural Language Processing (NLP) meth-
ods and use Standford parser [37] and/or OpenNLP [1] tools to perform the NLP
analysis of the requirements.
5.6.1 Quality analysis
In order to evaluate the quality of the tools, a confusion matrix will be used [43] (see
Table 5.5).
The symbols used in Table 5.5 are described below:
• T (true) - tool answer that is consistent with the expert decision
• F (false) - tool answer that is inconsistent with the expert decision
5.6. Benchmarking use cases - case study 46
Table 5.5: Confusion matrix
No
TP FP
No FN TN
Expert answer
Syste
m an
swer Yes
Yes
• P (positive) - system positive answer (defect occurs)
• N (negative) - system negative answer (defect does not occur)
On the basis of the confusion matrix, following metrics [?] can be calculated:
• Accuracy (AC) - proportion of the total number of predictions that were cor-
rect. It is determined using Equation 5.1.
AC = T P +T N
T P +F N +F P +T N(5.1)
• True positive rate (TP rate) - proportion of positive cases that were correctly
identified, as calculated using Equation 5.2.
T P r ate = T P
T P +F N(5.2)
• True negative rate (TN rate) is defined as the proportion of negatives cases that
were classified correctly, as calculated using Equation: 5.3.
T N r ate = T N
T N +F P(5.3)
• Precision (PR) - proportion of the predicted positive cases that were correct, as
calculated using Equation 5.4.
PR = T P
T P +F P(5.4)
The referential use-case specification was analysed by the mentioned tools (to
perform the NLP processing Stanford library was used) in order to find 10 types of
defects described in [30]. Aggregated values of the above accuracy-metrics are as
follows:
• AC = 0.99
• TP rate = 0.97
• TN rate = 0.99
• PR = 0.80
5.6. Benchmarking use cases - case study 47
This shows that the developed tools for defects detection are rather good, how-
ever the precision should be better, so the researchers could conclude that more
investigation is needed in this area.
If the researchers worked on their tool using only defect-prone or defect-free
specifications the results could be distorted and could lead to some misleading con-
clusions (e.g. that the tool is significantly better or that it gives very poor outcome).
Having access to the typical specification allows the researchers to explore main dif-
ficulties the tool can encounter during working with the industrial specifications.
5.6.2 Time analysis
One of the main characteristics of a tool being developed is its efficiency. Not only
developers are interested in this metric, but also for the users - e.g. when analysts
have to choose between two tools which give the same results, they would choose
the one which is more efficient. However, it can be hard to asses the efficiency
just by running the tool with any data, as this may result with distorted outcome.
Use-cases benchmark gives the opportunity to measure the time, required by a tool
to complete its computations, in an objective way. It can be also valuable for re-
searchers constructing tools to consider using different third-party components in
their applications.
For instance, in case of the mentioned defect-detection tool there is a possibil-
ity of using one of the available English grammar parsers. In this case study two of
them will be considered - Standford parser and OpenNLP. If one exchanges those
components the tool will still be able to detect defects, but its efficiency and accu-
racy may change. Therefore, the time analysis was performed to asses how using
those libraries influence the efficiency of the tool. The overall time required to anal-
yse the typical specification2 for the tool using Stanford parser was 53.11 seconds
and for the one with OpenNLP it was 29.46 seconds. Although the first time value
seems to be large, the examined tool needed on average 280 ms to analyse a single
use-case step, so it should not be a problem to use it in the industrial environment.
Of course, when comparing efficiency, one have to remember that other factors (like
the quality of the results, memory requirements or the time needed for the initial-
ization of the tools) should be taken into account, as they can also have significant
impact on the tool itself. Therefore, time, memory usage and quality analysis is pre-
sented in Table 5.6. This summarised statistics allow researchers to choose the best
approach for their work.
2Tests were performed for the tools in two versions: with Stanford and OpenNLP parsers. Eachversion of the tools analysed the referential specification five times, on the computer with Pentium
5.7. Conclusions 48
Table 5.6: Case-study results (tools are written in Java, so memory was automatically managed –garbage collection)
Stanford 53.11 0.28 212 37.01 0.8 27 539
29.46 0.16 338 13.58 15.4 225 3977
Quality
AC PR
Stanford 0.99 0.97 0.99 0.80
0.99 0.83 0.99 0.79
Summarised for all components English grammar parser only
English grammar parser used
Overall processing time [s]
Mean time needed to analyse one step [s]
Maximal memory utilization [MB]
Overall processing time [s]
Startup time [s]
Initial memory usage [MB]
Memory usage while processing single element
[KB]
OpenNLP
TP rate TN rate
OpenNLP
Although this specification describes some abstract system, it shows the typi-
cal phenomenon of the industrial requirements specifications. The conducted case
study shows that the developed referential use-case specification can be used in dif-
ferent ways by both researchers and analysts.
5.7 Conclusions
In the paper an approach to create a set of referential use-cases-based requirements
specifications was presented.
In order to derive a profile of a typical use-cases-based requirements specifica-
tion 524 use cases were analysed. It has proved that some of the use-case properties
are project-independent (are observed in most of the projects). They have been used
for creating the referential specification. On the other hand, there is also a number
of properties which depend very much on the author.
In order to present potential usage of the benchmark specification, a case study
was conducted. Two sets of tools for defect detection in requirements were com-
pared from the point of view of efficiency and accuracy.
In this paper a second release of UCDB is analysed, which was recently extended
from 432 to 524 use cases. A positive observation is that despite the number of use
cases increased by 21% the typical profile did not change a lot. Therefore it seems
that the typical profile derived based on the use cases stored in the database is stable
and should not differ much in the future releases.
Core 2 Duo 2.16 GHz processor and 2GB RAM.
5.7. Conclusions 49
Acknowledgments.
We would like to thank Piotr Godek, Kamil Kwarciak and Maciej Mazur for support-
ing us with industrial use cases.
The research presented at the CEE-SET’08 and being part of this paper has been
financially supported by the Polish Ministry of Science and Higher Education under
grant N516 001 31/0269.
Additional case studies and analysis have been financially supported by Foun-
dation for Polish Science Ventures Programme co-financed by the EU European Re-
gional Development Fund.
Chapter 6
Identification of events in use cases
performed by humans
Preface
This chapter contains the paper: Jakub Jurkiewicz, Jerzy Nawrocki, Mirosław
Ochodek, Tomasz Głowacki, HAZOP-based identification of events in use cases. An
empirical study, accepted to the Empirical Software Engineering Journal, DOI: 10.1007/s10664-
013-9277-5.
My contribution to this chapter included the following tasks: (1) elaboration of
the H4U method; (2) conducting the experiments with students and with IT profes-
sionals; (3) analysis of the results of the experiments; (4) interpreting the results of
the experiments;
Context:
In order to be able to fully evaluate the automatic method of identification of
events in use cases presented in Chapter 7, the effectiveness and efficiency of man-
ual approaches had to be evaluated in the first place. No method aimed at identifi-
cation of events has been found, therefore, a method based on the HAZOP approach
is proposed. The method is compared to an ad hoc approach. Accuracy and review
speed of both methods are evaluated in experiments with students and IT profes-
sionals.
Objective: The goal of this study was to check if (1) HAZOP-based event identifica-
tion is more effective than ad hoc review and (2) what the review speed of these two
approaches is.
Method: Two controlled experiments were conducted in order to evaluate the ad hoc
approach and H4U method of event identification. The first experiment included 18
students, while the second experiment was conducted with the help of 64 profes-
50
6.1. Introduction 51
sionals. In both cases, the accuracy and review speed of the investigated methods
were measured and analysed. Moreover, the usage of HAZOP keywords was anal-
ysed. In both experiments, a benchmark specification based on use cases was used.
Results: The first experiment with students showed that a HAZOP-based review is
more effective in event identification than an ad hoc review and this result is statis-
tically significant. However, the reviewing speed of HAZOP-based reviews is lower.
The second experiment with IT professionals confirmed these results. These experi-
ments also showed that event completeness is hard to achieve. On average it ranged
from 0.15 to 0.26.
Conclusions: HAZOP-based identification of events in use cases is an useful alter-
native to ad hoc reviews. It can achieve higher event completeness at the cost of an
increase in effort.
6.1 Introduction
Use cases introduced by [59] have gained a lot of attention in the software engineer-
ing community since their introduction in the 1980’s ([92]). Roughly speaking, a use
case is a sequence of steps augmented by a list of events and the alternative steps
associated with them that respond to those events. Both the steps and the events
are written in a natural language. This makes reading and understanding software
requirements specification easier, especially if the readers are IT laymen.
One of the main attributes that can be assigned to use cases (and any other form
of requirements) is completeness ([55]). In the context of use cases, completeness
can have several meanings: e.g. functional completeness (the set of use cases covers
all the functional features of the system) or formal completeness (there are no TBDs
— [55]). In this paper the main concern is event-completeness: requirements spec-
ification presented in the form of use cases is event-complete if all the events that
can occur during the execution of the use cases are named in this specification.
[114] indicate that missing error handling conditions account for most software
project rework. This increases project costs and also leads to delayed delivery. In
the context of use cases, these error handling conditions relate directly to use-case
events. Moreover, many authors ([83, 120, 28, 35, 9]) indicate that the list of events
should be complete, e.g. [9] propose a pattern called Exhaustive Alternatives which
states Capture all failure and alternatives that must be handled in the use case. Thus,
one needs an effective method of checking the event-completeness of a given set
of use cases. Such a method would also be useful in the context of the spiral re-
quirements elicitation recommended by [9]. Their approach is based on the breadth
6.2. An evolutionary approach to writing use cases 52
before depth pattern. Firstly, they suggest listing all the actors and their goals, then
adding a main success scenario to each goal. After a pause, used to reflect on the
requirements described thus far, one should continue identifying the events and
adding alternative steps for each event. In this paper, two methods of events iden-
tification are studied experimentally: ad hoc and H4U (HAZOP for Use Cases). The
former does not assume any particular process or tools: The Analyst is given a set of
use case scenarios and he/she has to identify the possible events in any way he/she
preferes. The second method is based on the HAZOP review process ([102]).
The paper aims at answering the following questions: 1) What level of complete-
ness can one expect when performing events identification (with H4U and ad hoc
approaches)? 2) How should we perform events identification in order to obtain the
maximum completeness? 3) How can we minimize the effort necessary to obtain
the maximum level of completeness?
The paper starts with a short presentation of the 9-step procedure to elicit re-
quirements specification in Section 6.2. That procedure is an implementation of
the breadth before depth strategy recommended by Adolph et al., and it shows a
context for events identification (however, other methods of requirements specifi-
cation based on use cases can also benefit from H4U). Details of the H4U method
along with the example are presented in Section 6.3. To evaluate H4U against ad
hoc review, two controlled experiments have been conducted: with students and
with practitioners. The design of the experiments and their results are presented in
Section 6.4. Both experiments focus on individual reviews. An interesting aspect of
the conducted experiments is usage of HAZOP keywords. Those data are presented
in Section 6.5. Related work is discussed in Section 6.6.
6.2 An evolutionary approach to writing use cases
6.2.1 Overview of use cases
A use case consists of five main elements (see Figure 6.1): (1) a header including
the name of a use case; the name indicates the goal of the interaction; (2) a main
scenario describing the interaction between the actor and the described system;
the main scenario should present the most typical sequence of activities leading to
achievement of the goal of the interaction; (3) extensions consisting of (4) events
that can occur during the execution of the main scenario and (5) alternative steps
taken in response to the events. The focus of this paper is on the fourth element of
use cases, i.e., the events.
Moreover, use cases can specify requirements on different levels: summary goals,
6.2. An evolutionary approach to writing use cases 53
user goals, subfunctions ([31]). The user goals level is the most important one as it
shows the main interactions between the user and the system. Events in use cases at
this level should relate either to decisions made by the user regarding the path taken
to obtain his/her goal or obstacles which the user can encounter when performing
actions leading him/her to achieving his/her goal. Use cases present functional be-
haviour requirements and should not include information related to architectural
decisions or non-functional requirements (NFRs), such as events related to the per-
formance or to internal architectural components. Non-functional requirements are
specified separately using, for instance, ISO 25010 characteristics ([56]).
As with every other element, the form and the quality of events depend heavily
on the skills of the author of a use case. However, some experts ([9]) have proposed
best practices concerning writing high quality use-case events. The most impor-
tant practice is called Detectable Conditions. It suggests describing only those events
which can be detected by the system. If the system cannot detect an event, then it
cannot react to it; hence, specifying those events is useless as they can blur the use
case.
ID: UC1 Name: Edit a blog post Actor: Author Main scenario: 1. Author opens a blog post. 2. Author chooses the edit option. 3. System prompts for entering the changes. 4. Author enters the changes to the post. 5. Author confirms the changes. 6. System stores the changes.
Extensions:
2.A. Author does not have permissions to edit the given post.
2.A.1. System displays error message
2
3
4
5
1
Figure 6.1: An exemplary use case with a header (1), main scenario (2), extensions (3) consistingof an event (4) and an alternative sequence of steps (5).
6.2.2 Process of use-case elicitation
There are two contexts in which event identification is useful: review of use cases
and elicitation of use cases.
6.2. An evolutionary approach to writing use cases 54
In the context of use-case review, a reviewer (or an inspection team) is given a
set of use cases and he/she tries to determine if those use cases are event-complete.
In order to do this, one could identify a set of events associated with each step and
check if every event from this set has its counterpart in the specification under re-
view.
In the context of use case elicitation, the practice of spiral development sug-
gested by [9] can be very useful. It can be supported by another practice called
breadth before depth: first an overview of use cases should be developed (almost
all the use cases should be named), and detailed description should be added at the
next stage. These two practices go well with the approach proposed by [31]: “Expand
your use cases into Scenario Plus Fragments by describing the main success scenario
of each use case, naming the various conditions that can occur, and then fleshing out
some of the more important ones”.
In our work the patterns of spiral development and breadth before depth have
been elaborated into a 9-step process for requirements specification elicitation. That
process supports the XPrince methodology ([90]) and is presented in Figure 6.2 as a
use case. Description of the problem, which is the main outcome of Step 1, is based
on RUP’s project statement ([75]). The business process described in Step 2 can be
extended if necessary with other forms of knowledge description, such as examples
of information objects (e.g. Invoice) if the domain requires this level of detail. A
context diagram created in Step 3 allows better understanding of the overview of the
system, the actors interacting with the system, and the interfaces which the system
will need to use. The use-case diagrams developed in Step 4 present the goals of the
actors. The domain objects, identified in Step 5, are the objects the system will ma-
nipulate with or present to the user. These objects represent the physical objects of
the real world. Having those objects the checking of completeness of the set of use
cases from the point of view of CRUD (Create, Retrieve, Update, Delete) ([84]). In
Step 6 each use case manipulating a domain object is classified as C, R, U or D, and
if any of those letters is missing for a given domain object, then the analyst should
provide an explanation or extend the set of use cases to include the missing opera-
tion (this kind of analysis was proposed by [121]). In Step 7 a main scenario for each
use case is provided. Events which may occur when main scenarios are executed
are identified in Step 8; here the proposed method (H4U) is used. Step 9 completes
each use case with alternative steps assigned to each event.
From the above it follows that whenever the analyst wants to review use cases
or elicit them in a systematic way he or she will be confronted with the problem of
events identification.
6.2. An evolutionary approach to writing use cases 55
Name: Elicitate functional use cases Actor: Analyst Main scenario: 1. Analyst describes the problem, its implications and the proposed solution. 2. Analyst acquires domain knowledge and presents it as a set of business processes. 3. Analyst outlines the solution with a context diagram. 4. Analyst extends the context diagram to a use-case diagram. 5. Analyst identifies domain objects through analysis of domain knowledge described in step 2. 6. Analyst checks the completeness of the use cases operating the domain objects. 7. Analyst provides a main scenario for each use case. 8. Analyst annotates each use-case step with events that may possibly occur when the step is executed. 9. Analyst provides for each event an alternative sequence of steps.
Figure 6.2: The 9-step process of use-case elicitation presented as a use case.
6.2.3 The problem
In this paper the focus is on step #8, identification of events which may occur when
the main scenario is executed. To be more precise, one can formulate the problem
in the following way:
Problem of events identification: Given a step of a use case, identify a set of events
that may occur when the step is executed. Identify all possible events assuming that
the system is formally correct and exclude the events related to non-functional aspects
of the system.1
Obviously, the events unrelated to the step, but occurring in parallel to it just by
chance, should be excluded — we are only interested in the cause-effect relationship
between an event and a step in the main scenario that causes the change in the
executed scenario.
There can be several approaches to the problem of events identification. Typi-
cally events in use cases are identified as a result of an ad hoc process by an analyst
or customer who try to guess what kind of events or exceptions can occur while use-
case scenarios are being performed. However, a special approach, oriented strictly
towards the problem of events identification could also be used. An example of such
a method is H4U (discussed in details in Section 6.3). Based on this method the
following goal arises: Analyze the methods of event identification for the purpose of
evaluation with respect to accuracy and speed from the point of view of an analyst.
1In order to facilitate the identification of events in the experiments presented in Section 6.4, wegave the participants the freedom to identify any kinds of events they wanted to. However, it resultedin a visible number of events that had to be rejected from the analysis.
6.3. HAZOP based use-case review 56
6.3 HAZOP based use-case review
The proposed H4U method for events identification is based on a well-known review
approach for mission critical systems called HAZOP ([102]) (short for H4U which
stands for HAZOP for Use Cases). The sections below describe the details of the H4U
method and discuss how it differs from the original HAZOP approach.
6.3.1 HAZOP in a nutshell
HAZOP (Hazard and Operability Study) is a structured technique for the identifi- HAZOP
cation and analysis of hazards in existing or planned processes and designs ([29,
76, 122, 102]). Moreover, HAZOP allows for the distinguishing of possible causes of
hazards and their implications, together with available or future protection steps.
The method was initially intended for the analysis of chemical plants. Later it was
adapted for analysis of computer-based systems.
The method involves a team of experts who use special keywords that guide
them through identification of so called deviations (deviations are counterparts of
events in use-case steps). There are two kinds of keywords: primary and secondary. KEYWORDS
Primary keywords refer to attributes (Temperature, Pressure, etc.). Secondary key-
words indicate deviations (NO, MORE, etc.). Considering a fragment of a chemical
or IT system, the inspection team of experts tries to interpret a pair of keywords
(e.g., Pressure - MORE could be interpreted as too much pressure in a pipe).
HAZOP uses the following secondary keywords ([102]):
• NO: the design intend is not achieved at all,
• MORE: an attribute or characteristic exceeds the design intend,
• LESS: an attribute or characteristic is below the design intend,
• AS WELL AS: the correct design intent occurs, but there are additional ele-
ments present,
• PART OF: part of the design intend is obtained,
• REVERSE: the opposite of the design intent occurs,
• OTHER THAN: there are some elements present, but they do not fulfill the
design intend,
• EARLY: the design intent happens earlier than expected,
• LATE: the design intent happens later than expected,
• BEFORE: the design intent happens earlier in a sequence than expected,
• AFTER: the design intent happens later in a sequence than expected.
At the end of a HAZOP session, a report is created (see Figure 6.3). The report
includes, among other things, two columns: Attribute (primary keyword) and Key-
word (secondary keyword), which together describe an event that can occur when
the system is running.
6.3. HAZOP based use-case review 57
QuestionsProtectionConsequencesCauseKeywordAttributeItem
Figure 6.3: HAZOP report sheet
6.3.2 H4U: HAZOP for Use Cases
To solve the problem of events identification, adapting the HAZOP method is pro- H4U
posed. The first question is how to interpret the terms “design intent” and “devia-
tion”. The H4U method is targeted at analysis of use cases, therefore:
• design intent is the main scenario of a use case;
• deviation is an event that can appear when the scenario is executed and re-
sults in an alternative sequence of steps.
There two kinds of deviations: functional and non-functional. Functional devi-
ations from the main scenario are all those events that can appear even when the
system works correctly. An example could be a trial to buy a book which is not in
the store or paying with an expired credit card. Non-fuctional deviations follow from
a violation of nonfunctional requirements such as correctness, availability, security,
etc. (see e.g. ISO 25010 characteristics [56]). Examples include user session time-
out, unacceptable slowdown, etc. Those events shouldn’t be specified within use
cases because they are cross-cutting and could appear in almost all the steps, which
would make a use case unreadable. Therefore, the deviations as well as the design
intent should be kept separate as they belong to two different dimensions. Non-
functional deviations could also be identified using HAZOP but, according to the
separation of concerns principle, they should be specified elsewhere and therefore
they are out of the scope of the H4U method.
The previously presented HAZOP secondary keywords are generic keywords used
in process industries with the addition of the last four keywords introduced in [102]
where HAZOP for software is presented. These four keywords were introduced in
order to analyze the timing aspect of software. It was decided not to alter the pre-
sented set of secondary keywords. There are other sets of keywords that could be
used, e.g. the guide words from the SHARD method proposed by [100], however, the
aim of this paper is to investigate how the well known HAZOP method can influence
events identification.
The original HAZOP primary keywords are specific to chemical installations and
they should be replaced with a set which would fit computer-based systems. To
achieve this, a set of 50 industrial use-case events were analyzed by two experts. As
6.3. HAZOP based use-case review 58
a result the following primary keywords have been identified:
• Input & output data — relates to objects that are exchanged between external
actors and the system;
• System data — relates to objects already stored in the system;
• Time — relates to the “moment of time” when a step is being executed (e.g.,
dependencies between actions, pre- and post- conditions of actions); it does
not relate to non-functional aspects such as performance (e.g. response time)
or security (user session timeout).
To check the completeness of the above set of primary keywords, a small em-
pirical study has been performed. 300 use-case events have been investigated by
experts in the field, and it was possible to present each of them as a pair of one of
the proposed primary keywords and standard secondary keywords; thus, one can
assume that the proposed set of primary keywords is complete. Since the set of pri-
mary keywords consists of only 3 elements, it is easy to memorize them; therefore
there is no need to include them as a checklist during the review process.
The H4U method can be used in two ways: as an individual review method and
as a team review method. The overall process of the H4U method is presented in H4U PROCESS
the form of a use case in Figure 6.4. As an output, a report is produced according
to the H4U report table presented in Figure 6.5. The report should consist of the
following elements: Use case ID — ID of a use case, Step number — the number
of the step in the main scenario, Primary keyword — the primary keyword used to
identify an event, Secondary keyword — the secondary keyword used to identify an
event, Event — the identified event.
ID: UC1 Name: Identification of an event Actors: Reviewer (can also be a Review Team) Main flow: 1. Facilitator presents the H4U method to the Reviewer. 2. Facilitator provides the Reviewer with a set of main scenarios of use cases. 3. Reviewer reads the main scenarios step by step. 4. Reviewer uses primary and secondary to identify the events and annotates the steps with the events. 5. Reviewer presents the results the Facilitator.
Figure 6.4: Overall process of the H4U method.
6.3.3 Example
This section presents an example of how the H4U method could be used to identify
events in an exemplary use case from Figure 6.1. An analyst goes through every step
6.4. Experimental evaluation of individual reviews 59
EventSecondary keywordPrimary keywordStep numberUse case ID
Figure 6.5: H4U report table.
of the use case and uses the proposed primary and secondary keywords in order to
identify as many events as possible. Let us assume that the analyst is currently ana-
lyzing Step 4: Author enters the changes to the post. Events which might be identified
using the primary and secondary keywords analysis are presented in Figure 6.6.
Author exceeds the limit of characters per single post.Author changes the post after the allowed time passed.
Author did not provide data.
Event
LATE
NO
MORE
Secondary Keyword
Time
Primary Keyword
Input & output data
Input & output data
4UC1
4UC1
4
Step number
UC1
Use Case ID
Figure 6.6: Exemplary results of applying the H4U method to one use-case step.
6.4 Experimental evaluation of individual reviews
In order to evaluate the process of events identification in use cases, we decided to
conduct a series of controlled experiments.
The goal of the experiments was to analyze H4U and ad hoc use-case events
identification; for the purpose of evaluation; with respect to their review speed and
accuracy; from the point of view of both students and experienced IT professionals;
in the context of a controlled experiment performed at university and in industry.
The first experiment (Experiment 1) was conducted in academic settings to in-
vestigate how the structured and ad hoc approaches to events identification support
the work of students of software engineering.
The second experiment (Experiment 2) was conducted with the participation of
professionals, in order to observe how the methods affect the work of experienced
reviewers who work professionally on software development.
Both experiments were conducted according to the same design; therefore, we
will present the details only in the description of Experiment 1, and discuss the dif-
6.4. Experimental evaluation of individual reviews 60
ferences in the description of Experiment 2. The results of both experiments were
independently analyzed by two researchers. Every disagreement was discussed until
a decision about the problematic element was made.
6.4.1 Empirical approximation of maximal set of events
In order to be able to analyze and interpret the results of the experiments a maximal
set of all possible events in the software requirements specification being an object
of the experiments has to be agreed. After removing irrelevant events, such a set is
to be used to assess the accuracy of events identification process. The procedure of
removing irrelevant events will be discussed in the following sections.
Three approaches to defining the reference set of all possible (acceptable and
irrelevant) events were considered:
• A1: include events that were defined by the authors of benchmark specifica-
tion;
• A2: the authors of the paper could brainstorm in order to identify their own
reference set of possible events;
• A3: construct the approximate maximal set in an empirical way, based on all
distinct events identified by the participants in the experiments (after review-
ing them).
We decided to choose approach A3, because we assumed that a group contain-
ing around 100 people would be able to generate the most exhaustive set of events.
If we had decided to accept the approaches A1 or A2, we would had the potential
problem of interpreting the proposals of events provided by the participants that
were reasonable (events that could possibly appear in this kind of system) but nei-
ther the authors of the paper nor the authors of the use-case benchmark could iden-
tify them. After the experiment, it turned out that it was a good choice, because the
set of events defined by the participants of the experiments contained all the events
that were originally present in the benchmark use cases.
Rejecting undetectable events
As it was mentioned in Section 6.2, only the events which may be detected by a sys-
tem are considered valid. For example the event “system displayed a wrong form” is
apparent to the user, but the system cannot detect it and react to it. These kinds
of events were very often related to possible defects (bugs) in the system. All the
participants (in both experiments) proposed in total 4024 events during the experi-
ments. Based on the initial analysis, 1593 events were rejected as being undetectable
or invalid, therefore 2431 events were taken into consideration in further analysis.
6.4. Experimental evaluation of individual reviews 61
Handling duplicated events
After the experiments, we analyzed every event in the context of the use case it be-
longs to and the use-case step by which it is triggered. As a result an abstract class
was assigned to such an event; the class represents its semantic meaning (the same
abstract event can be expressed in different ways using natural language). It was
noticed that some participants had included a number of abstract classes of events
in one event description. Moreover, during the analysis it appeared that the partici-
pants had employed different approaches to assign events to use-case steps. Let us
consider an exemplary part of the use-case scenario presented in Figure 6.7.
4. … 5. Candidate fills and submits the form. 6. System verifies if data is correct. 7. System informs Candidate that account has been created. 8. …
Figure 6.7: Example of groups of steps in a use case
An exemplary list of events that could be identified by participants, together with
the identifiers of steps the events were assigned to, are presented in Figure 6.8. It
can be noticed that all the proposed events describe the problem of missing data,
and all of them could be assigned to each of the three steps. Therefore, it does not
really matter to which of the three steps the event is assigned to, however, all of
them should be treated as the same event. We have to mention that some of the
participants in the similar cases assigned the same type of event to all of the steps,
i.e. they wrote down three separate events. Unfortunately, this could visibly influ-
ence the results of the experiments, as participants who had written down the same
events many times for closely related steps would have a higher number of identi-
fied events, even though the events carried the same meaning.
To solve this problem we decided to identify groups of consecutive steps that are
indistinguishable in terms of events that can be assigned to them. It was decided to
treat this kind of group of steps as a whole: if participant assigned an event to any
of the steps from the group it was treated as if the event was assigned to the whole
group; if the participant assigned an event to more than one step from the group it
was considered as a single assignment to the whole group. The requirements specifi-
cation, being an object of the experiments, consisted of 33 use cases and 113 groups
of steps (and 156 individual steps).
6.4. Experimental evaluation of individual reviews 62
After this phase of analysis, 2510 abstract classes of events assigned to events
descriptions were identified.
7Candidate provided no data
6User didn't provide all required data
5
Step number
Candidate doesn't provides all necessary data
Event
Figure 6.8: Example of events that could be identified for a group of use-case steps presented inFigure 6.7.
Excluding non-use-case events
As a result of the analysis of the abstract classes of events assigned to groups of
steps, we observed that some classes of valid events (which could be detected by
the system) should not be included in descriptions of use cases on user goals level
or should be excluded from the analysis.
Most of such events related to non-functional requirements, which should be
specified separately from use cases (see the explanation in Section 6.3.2).
The second group of events that we decided to exclude from the analysis ex-
tended main scenarios with some auxiliary functions. The participants in the ex-
periments could use their imagination to propose any number of such extensions,
which would make it extremely difficult to decide whether or not we should include
them in the analysis.
The rejected classes of events are presented in Table 6.1, together with examples
and the justification for excluding them.
Events included into the maximal set of events
In the end 1550 abstract classes of events assigned to events descriptions were left
for construction of the approximation of the maximal set of events. The list of ac-
cepted events classes is presented in Table 6.2.
6.4. Experimental evaluation of individual reviews 63
Table 6.1: Abstract classes of events that should not be included into use-case descriptions (theywere excluded from the analysis).
Abstract classof event
Explanation Example Number
AuthorizationProblem
Events concerning authentication and au-thorization problems. They should not beincluded in alternative flows of a user-leveluse case, but handled separately as NFRs.
User does not have per-mission to add a newthread.
42
Concurrencyof Operations
Events related to concurrency problemsat the level of internal system operations(e.g., concurrent modification of the sameobject). This kind of requirement relates toNFRs (transactional operations). There arealso concurrent operations that relate tobusiness rules (e.g., while one is applyingto a major the admission is closed). Thiskind of events belong to another abstractclass of events and was accepted.
Candidate data waschanged by the ad-ministrator before thecandidate submittedmodifications.
41
ConnectionProblem
Events describing problems with the con-nection between a user and the system.Use-cases should not include technical de-tails. It should be described as an NFR.The system is not able to respond to suchan event.
The data could not besent to the server.
57
Internal Sys-tem Problem
Events describing failures inside the sys-tem. These events should not be includedas user-level use cases.
System was not able tostore the data in thedatabase.
339
Linked Data Problems related to the consistency ofdata stored in the system. It should ratherbe described in the data model, businessrules, or NFRs.
Administrator selectsitem to be removed thatis connected to otheritems in the system.
7
Other Action Events expressing actor’s will to perform adifferent action than the action describedin a use-case step. Any participant couldcome up with an enormous number ofsuch events. We would not be able to as-sess if they are reasonable and required bythe customer.
User wants to viewthe thread instead ofadding a new one.
226
Timeout Events referring to exceeding the timegiven to complete a request (e.g., user ses-sion timeout). Should be describe as anNFR.
It takes candidate toolong to fill the form.
68
Withdraw Events describing cases when a user wantsto resign from completing a running op-eration. This events could be assigned toalmost every step of a use case, so thiswould make the use case cluttered.
User does not want toprovide the data, sohe/she quits.
180
6.4.2 Experiment 1: Student reviewers
Experiment design
The independent variable considered in the experiment was a method used to iden-
tify events in use cases (two values were considered: ad hoc and H4U). Two depen-
dent variables were defined: the speed of the review process and its accuracy. The
6.4. Experimental evaluation of individual reviews 64
Table 6.2: Abstract classes of events that should be included in use-case descriptions (they wereincluded in the analysis).
Abstract class ofevent
Explanation Example Number
After Deadline The operation is not allowed, because itis performed after deadline.
The admission forthe chosen major isclosed.
12
Algorithm Run-time Error
Problems with the execution of an ad-mission algorithm, which is defined by auser.
Unable to executethe algorithm.
9
Alternative Re-sponse
Events describing the alternative re-sponse of an actor (usually, a negativeversion of the step in the main scenario)
The committee rejectthe application.
89
Connection Be-tween Actors
Problems with communication betweenthe system and external actors, i.e. ac-tors representing other systems or de-vices. The system is able to respond tosuch an event.
Payment systemdoes not respond.The system canpropose to use adifferent method ofpayment.
65
Lack of Confir-mation
A user does not confirm his/her action. Administrator doesnot confirm remov-ing the post.
12
Missing Data Events concerning missing input data. Candidate did notprovide personaldata
481
No Data Defined System does not have requested data. There are no majorsavailable
259
Payment events Events identified in a use case “Assignan application fee to a major.” They re-late either to credit card payment prob-lems (expired card or insufficient re-sources) or choice of an alternative pay-ment method.
Credit card has ex-pired.
18
Redo Forbidden Actor would like to redo some operationthat can be performed only once (e.g.,registering for the second time).
Candidate triesto register for thesecond time.
103
Wrong Data orIncorrect Data
These are events related to the wrong (inthe sense of content) or incorrect (in thesense of syntactic format) input data.
Provided data is in-correct.
502
speed was measured as the number of steps in the use cases that were analyzed,
divided by the time required to perform the analysis (see Equation 6.1).
Speed(p) = #steps(p)
T[steps/mi nute] (6.1)
Where:
• p is a participant in an experiment;
• #steps(p) is the number of steps reviewed by the participant p;
• T is the total time spent on the review (constant: 60 minutes).
The accuracy was measured as a quotient between the number of events iden-
tified by a participant compared with the total number of distinct events identified
6.4. Experimental evaluation of individual reviews 65
by the participants of both experiments in the steps reviewed by a given participant
(see Equation 7.5).
Accur ac y(p) =∑di st ance(p)
s=1 #event s(p, s)∑di st ance(p)s=1 #Event s(s)
(6.2)
Where:
• p is a participant in an experiment;
• di st ance(p) is the furthest step in the specification that was analyzed by the
participant p;
• #event s(p, s) is the number of distinct events identified by the participant p
based on the step s;
• #Event s(s) is the number of distinct events identified by all participants in
both experiments based on the step s.
Participants The participants in the experiment were 18 4th and 5th-year students
of the Software Engineering programme at Poznan University of Technology. The
students were familiar with the concept of use cases. They had authored use cases
for previous courses in the SE programme curriculum, as well as in the projects they
had participated in.
The participants were randomly assigned to two groups: group G1 was asked to
use the ad hoc method and group G2 to use the H4U method.
Object — Software Requirements Specification The object of the experiment was
a use-case-based software requirements specification (SRS). To mitigate the risk of
using a specific requirements specification (i.e., too easy, or too difficult to analyze),
it was decided to use the benchmark requirements specification2 ([14, 13]). This is
an instance of a typical use-case-based SRS, which was derived from the analysis of
524 use cases coming from 16 software development projects.
For the purpose of the experiment the original benchmark specification was purged
by removing all the events (only bare scenarios remained). This consisted solely of
the use-case headers and main scenarios.
Operation of the experiment
Prepared instrumentation The following instrumentation was provided for the par-
ticipants of the experiment:
• A presentation containing around 30 slides describing the basics of use cases.
The G1 group was presented with the information about the ad hoc approach
2The benchmark requirements specification is available on-line at http://www.ucdb.cs.put.poznan.pl.
6.4. Experimental evaluation of individual reviews 66
to events identification. The information about the H4U method was pre-
sented only to the G2 group. At the end of the presentation, both groups were
informed about the details of the experiment.
• A benchmark specification being the object of the study.
• An editable form where the participants filled in the identified events.
• A questionnaire, prepared in order to identify possible factors which might
influence the results of the experiment.
All the materials were provided in English, as all the classes of the Software Engi-
neering programme are conducted in English.
Execution The procedure of the experiment consisted of the following steps:
1. The supervisor of the experiment gave the Presentation (15 minutes).
2. The participants read the Benchmark Specification and noted all the identified
events.
3. After every 10 minutes the participants were asked to save the file with the
events under a different name, and to change the color of the font they were
using. This allowed the experimenters to analyze how the set of identified
events was changing over in time. The students were given 60 minutes to fin-
ish their task.
4. After the time was finished, the students were asked to fill out the question-
naire.
Data validation After collection of the data, all questionnaires were reviewed. None
were rejected due to incompleteness or errors.
Analysis and interpretation
Descriptive statistics Before experimenters proceeded to further analysis, the ex-
perimental data was investigated to find and handle outlying observations. After
analyzing the descriptive statistics presented in Table 6.3, we did not observe any
potentially outlying observations.
The next step in the analysis was the investigation of the samples distributions
in order to choose proper statistical hypothesis tests.
The initial observation, based on an analysis of the Q–Q plots, was that the as-
sumption about samples normality might be violated for the speed variable. This
suspicion was further confirmed by the Shapiro–Wilk test.3 Therefore, we decided
to use non-parametric statistical tests for speed variable and parametric tests for the
accuracy variable.
3The results of the Shapiro–Wilk test (α = 0.05) — Speed G1: W = 0.804, p-value = 0.023 (rejectH0), G2: W = 0.860, p-value = 0.096 (not reject H0); Accuracy G1: W = 0.930, p-value = 0.477 (notreject H0), G2: W = 0.974, p-value = 0.925 (not reject H0).
6.4. Experimental evaluation of individual reviews 67
Table 6.3: Descriptive statistics (groups G1 and G2).
Speed[steps/min] Accuracy
G1 G2 G1 G2
#1 2.63 0.35 0.25 0.36#2 1.83 0.12 0.22 0.44#3 2.58 0.40 0.18 0.26#4 2.65 0.50 0.09 0.22#5 1.40 0.45 0.13 0.26#6 1.95 0.33 0.09 0.34#7 2.62 1.35 0.25 0.31#8 2.60 1.17 0.14 0.05#9 1.85 1.07 0.19 0.12
N 9 9 9 9Min 1.40 0.12 0.09 0.05
1st Qu. 1.85 0.35 0.13 0.22Median 2.58 0.45 0.18 0.26Mean 2.23 0.64 0.17 0.26
3rd Qu. 2.62 1.07 0.22 0.34Max 2.65 1.35 0.25 0.44
Hypotheses testing In order to compare the review speed of the H4U method to
the typical ad hoc approach, the following hypotheses were formulated. The null
hypothesis stated that there is no difference regarding the review speed of the H4U
and ad hoc methods:
Ha0 : θSpeed(G1) = θSpeed(G2) (6.3)
The alternative hypothesis stated that the median review speed differs between
these two methods:
Ha1 : θSpeed(G1) 6= θSpeed(G2) (6.4)
If accuracy is considered, it is important to investigate whether H4U improves
the accuracy of events identification in use cases. Therefore, the following hypothe-
ses were defined:
Hb0 :µAccuracy(G1) =µAccuracy(G2) (6.5)
Hb1 :µAccuracy(G1) <µAccuracy(G2) (6.6)
To test the hypotheses the Wilcoxon rank-sum test was used to test hypotheses
regarding speed and the t-test to test hypotheses concerning accuracy; both with
the significance level α set to 0.05.
As the result of the testing procedure, both null hypotheses were rejected (p-
values were equal to 4.114e-05 and 3.214e-02).
6.4. Experimental evaluation of individual reviews 68
Power analysis The observed normalized effect size4 expressed as Cohen’s d coef-
ficient ([32]) was “large”5 (Cohen’s d for review speed was equal to 3.49; and for
accuracy it was equal to 0.96).
Sensitivity analysis, which computes the required effect size with the assumed
significance level α, statistical power (1-β), and sample, revealed that in order to
achieve a high power for the performed tests (0.95) the effect size should be equal
to at least 1.62 (one-tailed t-test) and 1.97 (two-tailed Wilcoxon rank-sum test).
The observed effect size for speed was visibly greater than 1.97, while the one
observed for accuracy was lower than 1.62 (the calculated post hoc power was equal
to 0.62).
Interpretation Based on the results of the experiment, one could conclude that the
H4U method can help to achieve higher accuracy of identification of events in use-
case scenarios when used by junior software engineers.
Unfortunately, the higher accuracy of the H4U method was achieved by visibly
lowering the speed of the review process (the cumulative number of steps analyzed
by both groups in time are presented in Figure 6.9).
Threats to validity
The following threats to validity need to be taken into account when interpreting the
results of the experiment.
First, it needs to be noted that only 18 (9 for each group) participants took part
in the experiment, which constitutes a small sample.
Second, the participants were students; hence, it might be impossible to gener-
alize the results to the group of professional analysts. However, most of the partici-
pants had some practical experience in software houses, so the results might be gen-
eralized to junior developers. It is also noteworthy that the participants who used
the H4U method were not familiar with this method prior to the experiment; hence,
they were learning this approach during the experiment. This factor could possibly
have a negative influence on the review speed of the group using the H4U method.
An additional threat to conclusion validity is posed by the disproportion be-
tween the number of steps analyzed by participants using different methods. The
higher speed of ad hoc groups enabled them to analyze a greater number of steps
than participants using H4U. As a result the number of participants using H4U and
4Please note that the retrospectively calculated effect size only approximates the real effect size inthe populations from which the samples were drawn. More information can be found in [127].
5According to [32] effect size is perceived as “small” if the value of d is equal to 0.2, as “medium”if the value of d is equal to 0.5, and as “large” if d is equal to 0.8.
6.4. Experimental evaluation of individual reviews 69
●
●
10 20 30 40 50 60
050
100
150
200 H4U Univ: processed steps cumulative
Time [min]
Ste
ps
10 20 30 40 50 60
050
100
150
200 Ad hoc Univ: processed steps cumulative
Time [min]
Ste
ps
●●
●●
●●
●●
●
10 20 30 40 50 60
050
100
150
200 H4U Ind: processed steps cumulative
Time [min]
Ste
ps
●
10 20 30 40 50 60
050
100
150
200 Ad hoc Ind: processed steps cumulative
Time [min]
Ste
ps
Figure 6.9: The cumulative number of steps processed by reviewers in time (Univ — Experiment1; Ind — Experiment 2).
contributing proposals of events was decreasing in consecutive steps. As presented
in Figure 6.10, this could led to an increase in the accuracy of ad hoc groups.
1.0
1.5
2.0
2.5
Experiments limited to the number of steps
Mea
n ac
cura
cy H
4U/A
dHoc
● ●● ●
●●●●●●
●●● ●●● ●● ●●●●● ●●● ●●●●● ●●●●●●●●●●●●
●●●● ● ●●● ●● ●●● ●●●● ●●●●●●●●●●●●●●●● ● ●●●●
20 26 32 38 44 50 56 62 68 74 80 86 92 98 104 111 118 125 132 139 146 153
●
Legend
Ind Univ
Figure 6.10: The ratio between the mean accuracy of H4U and Ad hoc if the scope of experimentswas limited to a given number of steps.
6.4. Experimental evaluation of individual reviews 70
6.4.3 Experiment 2: Experienced reviewers
Experiment design
Participants The participants in the second experiment were 82 professionals, with
experience ranging from 1 to 20 years (mean 6 years, standard deviation 4 years).
The group was heterogeneous with respect to roles performed by the participants
in projects (software developers & designers, architects, analysts, and project man-
agers). Most of the participants had experience with use cases: 32% had used them
in many projects, 24% had used them in at least one project, 22% mentioned only a
couple of use cases. There were also participants who did not have any experience
related to use-case-based requirements: 14% had heard about use cases but had not
used them, 8% had never heard about use cases.
The participants were randomly assigned to two groups: group G3 that was asked
to use the ad hoc method and group G4 used the H4U method.
Operation of the experiment
Prepared instrumentation The materials were translated into Polish, because we
could not assume that all participants are fluent in English. In addition, an addi-
tional short survey was prepared to collect demographic information as well as to
obtain opinions about the H4U method from the professionals.
Data validation After collection of the data, all questionnaires were reviewed. We
decided to reject 18 observations because of their incompleteness. Unfortunately,
some of the participants were interrupted by their duties and had to leave before
the end of the experiment. Some decided to identify events and write alternative
scenarios to practice the whole use-case development process, which resulted in
less time spent on the experiment task of solely identifying events.
Analysis and interpretation
Descriptive statistics The experiment data was investigated to find and handle out-
lying observations. After analyzing the descriptive statistics presented in Table 6.4,
we found 1 potentially outlying observation in group G3 with respect to review speed.
However, after deeper investigation of the work, we did not find any justification for
rejecting it (the ad hoc process does not impose any specific approach to events
identification, so the review speed can differ visibly).
In the next step of the analysis samples distributions were investigated in order
to choose proper statistical hypothesis tests.
6.4. Experimental evaluation of individual reviews 71
Table 6.4: Descriptive statistics (groups G3 and G4).
Speed[steps/min] Accuracy
G3 G4 G3 G4
#1 2.18 2.32 0.19 0.13#2 2.07 0.85 0.04 0.36#3 2.20 0.38 0.13 0.45#4 2.12 2.02 0.13 0.05#5 0.68 1.97 0.14 0.03#6 2.17 0.78 0.16 0.23#7 1.90 0.68 0.11 0.09#8 0.87 0.38 0.21 0.08#9 2.23 1.07 0.09 0.10
#10 2.30 1.35 0.24 0.28#11 2.07 2.07 0.08 0.11#12 0.68 0.48 0.22 0.23#13 2.22 1.07 0.09 0.07#14 2.07 0.48 0.15 0.33#15 2.02 0.97 0.25 0.19#16 2.20 0.38 0.28 0.14#17 2.18 0.42 0.07 0.23#18 2.02 0.48 0.14 0.09#19 2.17 0.68 0.22 0.35#20 1.90 0.55 0.15 0.18#21 1.07 0.78 0.03 0.26#22 2.02 0.78 0.07 0.17#23 2.18 0.78 0.21 0.36#24 1.97 1.90 0.18 0.14#25 1.60 1.48 0.12 0.08#26 1.43 1.83 0.11 0.07#27 2.02 1.17 0.05 0.31#28 0.42 0.48 0.15 0.26#29 0.55 1.07 0.20 0.14#30 0.87 0.68 0.26 0.15#31 2.08 2.22 0.11 0.23#32 2.17 0.85 0.12 0.30
N 32 32 32 32Min 0.42 0.38 0.03 0.03
1st Qu. 1.56 0.53 0.10 0.10Median 2.05 0.82 0.14 0.17Mean 1.77 1.04 0.15 0.19
3rd Qu. 2.17 1.38 0.20 0.26Max 2.30 2.32 0.28 0.45
The observation made based on the analysis of the Q–Q plots was that the as-
sumption about samples normality might have been violated for the speed variable.
This suspicion was confirmed by the Shapiro–Wilk test.6 Therefore, again we de-
6The results of the Shapiro–Wilk test (α= 0.05) — Speed G3: W = 0.742, p-value = 3.953e-06 (rejectH0), G4: W = 0.868, p-value = 1.056e-03 (reject H0); Accuracy G3: W = 0.973, p-value = 0.577 ( rejectH0), G4: W = 0.951, p-value = 0.153 (not reject H0).
6.4. Experimental evaluation of individual reviews 72
cided to use the same non-parametric statistical test to test the hypotheses regard-
ing speed and the same parametric statistical test to test the hypotheses concerning
accuracy.
Hypotheses testing In order to compare the review speed and accuracy of the H4U
method with the ad hoc approach, a set of hypotheses similar to those formulated
in Experiment 1 was defined.
The null hypothesis related to review speed stated that there is no difference
regarding the review speed of the H4U and ad hoc methods:
Hc0 : θSpeed(G3) = θSpeed(G4) (6.7)
The alternative hypothesis stated that the median review speed differs between
these two methods:
Hc1 : θSpeed(G3) 6= θSpeed(G4) (6.8)
In addition, the following hypotheses regarding accuracy were defined:
Hd0 :µAccuracy(G3) =µAccuracy(G4) (6.9)
Hd1 :µAccuracy(G3) <µAccuracy(G4) (6.10)
As a result of performing the testing procedure, the null hypothesis Hc0 regard-
ing review speed was rejected (p-value was equal to 5.055e-05). The second null
hypothesis Hd0 concerning accuracy was also rejected (p-value was equal to 2.477e-
02).
Power analysis The observed normalized effect size expressed as Cohen’s d coeffi-
cient was “large” for review speed (Cohen’s d was equal to 1.21) and “medium” for
accuracy (Cohen’s d was equal to 0.50).
The sensitivity analysis revealed that in order to achieve a high power for the
performed tests (0.95), the effect size should be at least equal to 0.83 (one-tailed
t-test) and 0.99 (two-tailed Wilcoxon rank-sum test).
The effect size observed for the accuracy was smaller than 0.83. The calculated
post-hoc power of the test was equal to 0.63. The effect size observed for the speed
was higher than 0.99.
Interpretation Based on the results of the experiment, one could conclude that the
H4U method can help to achieve higher accuracy of identification of events in use-
case scenario when used by software engineers (not necessarily analysts).
Unfortunately, again the higher accuracy of the H4U method was achieved by
lowering the speed of the review process. The on-average review speed was lowered
6.4. Experimental evaluation of individual reviews 73
by 1.7-2.5 times in the case of the H4U method than for ad hoc (the cumulative
number of steps analyzed by both groups in time are presented in Figure 6.9).
In both experiments, we have observed that the students outperformed the pro-
fessionals in terms of accuracy. This observation was unexpected. We suspect that
it is caused by the fact that the students were more up to date with the topic of use
cases — they were learning about use cases and practicing during the courses.
Another observation made during both experiments is that the accuracy of events
identification is generally low if one considers individual reviewers. The observed
mean accuracy ranged from 0.15 (for professionals using the ad hoc method) to 0.26
(for students using H4U). This means that a single reviewer was “on-average” able to
detect only a quarter of all the events. Therefore, there is a need for a better method
of supporting identification of events.
When it comes to the review speed of the approaches, a non-structured ad hoc
approach seems visibly more efficient. However, in the survey conducted after the
experiment, more participants using the ad hoc approach had the feeling of solving
tasks in a hurry (65%) than did users of the H4U method (56%).
Although the H4U method forces users to follow a specific procedure, most of
the participants did not perceive it as difficult (∼68% “definitely no” and “no”; ∼19%
“difficult to say”; ∼13% “definitely yes” and “yes”).
In addition, more than half of the participants stated that they would like to use
the method in their next projects (∼53% “definitely yes” and “yes”; ∼34% “difficult
to say”; ∼13% “definitely no” and “no”)
Threats to validity
Again, there are some threats to the validity of the study that need to be taken into
consideration.
First, there is a threat related to experimental mortality. Among 82 participants
18 did not complete the experiment task, because they were disturbed by their pro-
fessional duties.
The threat to conclusion validity regarding the disproportion between the num-
ber of steps analyzed by both groups seems less serious than in Experiment 1 (see
Figure 6.10).
There is also a threat related to external validity. Although the experiment in-
volved professionals, only some of them were fulfilling the role of analyst in software
development projects.
It also needs to be noted again that the participants who used the H4U method
were not familiar with this method prior to the experiment; hence, they were learn-
6.5. Usage of HAZOP keywords 74
ing this approach during the experiment. This factor could possibly have a negative
influence on the review speed of the group using the H4U method.
6.5 Usage of HAZOP keywords
The proposed H4U method includes all of the 11 secondary keywords as suggested
by [102]. However, the analysis of the results of the conducted experiments leads to
the conclusion that not all of the keywords are used equally. The identified events in
both experiments were divided into three sets. The first set contained events which
could be detected by the system (according to the Detectable Conditions pattern
— [9]) and which were proper use-case events, the second set consisted of events
which could be detected by the system, but were not correct use-case events (e.g.
they were non-functional requirements), while the third set contained events which
could not be detected by system. The first set is the most important one, as it con-
tains the events which should be included in the use-case based requirements spec-
ification. Therefore, the keywords which helped identify the events from the first
set might be considered more useful. During the experiments, the participants who
used the H4U method were asked to choose which keyword helped them to iden-
tify the event (according to the H4U report table as presented in Figure 6.5). We
counted how many events were identified with usage of a given keyword. As pre-
sented in Table 6.5, some of the keywords were used very often (e.g., NO - about
48%) while other keywords were used very rarely (e.g., AS WELL AS - less than 1%)
and their contribution to the identification of detectable events was very small. The
presence of these very rarely used keywords might have made the analysis more te-
dious and could negatively influence the results. It appears that some of the key-
words might be merged, e.g. MORE and AS WELL AS; some of the participants were
using these keywords interchangeably. The pairs EARLY - BEFORE and LATE - AF-
TER were similar. Although the meaning of the keywords was different they led to
the discovery of very similar events. Merging these pairs would decrease the num-
ber of keywords, which could shorten the time needed for events identification with
the H4U method. Moreover, as noted in Section 4, the median review speed of the
H4U method was not higher than 0.82, so there is a lot of room for improvement.
On the other hand, the introduction of additional, more specific keywords might
help to increase the number of identified events. However, it should be noted that
the use of keywords might have been different if requirements specifications from
different domains were used, hence, for every business domain the presence of par-
ticular keywords should be carefully analysed. In case of a business domain which
has never been analysed with the H4U method and there is no experience in using
6.6. Related Work 75
the keywords, use the standard set of keywords is advised.
Furthermore, it is possible that the order of the keywords (they were presented
in the following order: NO, MORE, LESS, AS WELL AS, PART OF, REVERSE, OTHER
THAN, EARLY, LATE, AFTER, BEFORE) might influence how they were used by the
participants. There is a need for further research to see how the set of the secondary
keywords could be altered to increase the accuracy and review speed of the H4U
method.
Table 6.5: Percentage usage of HAZOP secondary keywords in both experiments (used in order toidentify: detectable use-case events, detectable non-use-case events and undetectable events).
Detectable Detectable Undetectableuse-case events non-use-case events events
NO 47.5 44.4 38.5MORE 5.2 4.2 10.0LESS 12.1 3.0 12.0
AS WELL AS 0.4 0.8 1.7PART OF 7.7 3.4 10.7REVERSE 6.0 11.0 3.8
OTHER THAN 5.8 5.3 10.7EARLY 2.7 4.2 2.3LATE 1.9 7.2 4.0
BEFORE 8.5 7.8 4.6AFTER 2.0 9.1 1.8
6.6 Related Work
Many authors have noted the problem of requirements specification completeness
with the focus on events associated with use cases. [120] included the following
question in his inspection checklist for use cases: Are all known exception conditions
documented?. [83] identified 5 characteristics concerning requirements specifica-
tion completeness; one of them reads: all scenarios and states are recognized. [35]
suggested 3 types of requirement defects, from the least severe, which have minimal
impact, to the most severe. To deal with the last type of defects, Cox et al. included
the following question in their use-case inspection guidelines: Alternatives and ex-
ceptions should make sense and should be complete. Are they? In order to solve the
issue of incomplete exceptions Carson proposed a 3-step approach for requirements
elicitation ([28]). The goal of the third step in this approach is to verify requirements
completeness, based on the idea of a complementary set of conditions under which
the requirements are to be performed ([27]).
Although the processes and approaches mentioned above, like many others ([18]
6.7. Conclusions 76
[22]), seem to emphasize the importance of event-completeness, they do not pro-
pose any specific method or technique aimed at identifying events. Our proposal is
to adopt the HAZOP approach for this purpose.
HAZOP has already been used in many software development contexts. [102]
described how HAZOP could be used for the analysis of safety issues in software
systems. [63] proposed UML-HAZOP, a method for discovering anomalies in UML
diagrams. He is using HAZOP to obtain a checklist against which the quality of UML
diagrams can be assessed.
None of the above authors has applied HAZOP to use cases. [17] were first to
apply HAZOP to scenarios and use cases. Their aim was to identify hazards in safety
critical systems. In their approach they analyzed pre-, post- and guard-conditions
together with system responses. The output of this approach is the possible hazards
and failures of the system. Another example of applying HAZOP for analysis of use
cases is the work by [111]. She and her coworkers were interested in identification of
security requirements. Their approach requires complete use cases (pre-conditions,
main scenarios and extensions, including events and alternative scenarios) as an in-
put. With the help of HAZOP keywords, they determined possible security require-
ments and policies.
The two above approaches might seem similar to the proposed H4U method as
they are applying HAZOP to a use case. In fact they are different. [111], and [17]
assume more or less complete use cases as an input, and the output is a report on
possible security and safety issues. Moreover, there is a difference in the keywords
used by [111], and [17], and H4U. As described in Section 2, H4U is part of a func-
tional requirements elicitation, whereas the approaches presented by [111], and [17]
are concerned with nonfunctional requirements. Analysis of use cases has also been
studied by others ([16, 112, 113]); however, they are not using HAZOP at all.
6.7 Conclusions
Event completeness of use cases is one of the quality factors of use-case based re-
quirements specification. Missing events might lead to higher project costs ([114]).
Therefore, a method called H4U (HAZOP for Use Cases) aiming at events identifi-
cation, was proposed in the paper and evaluated experimentally. The method was
compared to the ad hoc review approach. The main contribution of this paper is
that the results of accuracy of events identification in both cases were unsatisfactory,
hovewer, usage of the HAZOP-based method lead to the achievement of significantly
better accuracy.
The results led to the following more detailed conclusions:
6.7. Conclusions 77
• The maximum average accuracy of events identification in both conducted
experiments was 0.26. This shows that events identification is not an easy task,
and many events might be omitted during the analysis, hence new methods
and tools are needed to mitigate this problem.
• The proposed HAZOP-based method can help in achieving higher accuracy of
events identification. This is true in two distinct environments: IT profession-
als (0.19 accuracy with H4U and 0.15 with the at hoc approach) and students
(0.26 accuracy with H4U and 0.17 with the at hoc approach).
• H4U enables higher accuracy of events identification, however, it requires more
time to perform a review. In both groups, IT professionals and students, the
ad hoc approach was more efficient in terms of the number of steps analyzed.
On average the ad hoc approach review speed varied from 1.77 steps analyzed
per minute by professionals to 2.23 steps analyzed per minute by junior an-
alysts, while the review speed of the H4U method was respectively 1.04 and
0.64 steps analyzed per minute.
• Students versus professionals. There is a common doubt as to whether an ex-
periment with the participation of students could provide results that remain
valid for professionals. One of the most debatable questions regards the differ-
ences between the effectiveness of (typically inexperienced) students in com-
parison to more experienced professionals. Some of the research performed
so far indicates that for less complex tasks the difference between students
and professionals is minor ([51]). In the experiments described in the paper
we observed that master-level SE students can even outperform professionals
in some cases.
Acknowledgements
We would like to thank all participants of the experiments for their active partici-
pation. This research was funded by Polish National Science Center based on the
decision DEC- 2011/01/N/ST6/06794 and by the PUT internal grant DS 91-518.
Chapter 7
Automatic identification of events
in use cases
Preface
This chapter contains the report: Jakub Jurkiewicz, Jerzy Nawrocki, Automated
events identification in use cases, Politechnika Poznanska, RA-5/201.
My contribution to this chapter included the following tasks: (1) analysis of a set
of real-life use-cases in order to build a knowledge-base about the abstract types of
events and actions occurring in use-cases; (2) elaboration of inference rules aimed
at identification of abstract types of use-case event; (3) implementation of the proto-
type tool (including inference, NLP and NLG layers); (4) evaluation of the accuracy
and speed of the proposed method; (5) conducting and evaluating the Turing-test
based experiment.
Context: The results from Chapter 6 show that identification of events in use-cases
is not an easy task. Participants of the experiments were able to identify, on aver-
age, only 18% of possible events when using the ad hoc approach and 26% of pos-
sible events when using the HAZOP-based method. Such incompleteness of events-
identification might have a negative influence on software development phases, which
follow requirements elicitation and analysis. Moreover, we observed that manual re-
view of use-cases is a time consuming process. Thus, the challenge is how to speed
up identification of events by automating this activity in such a way that the com-
pleteness of obtained sets of events will be improved.
Objective: The goal of this Chapter is to propose a method of identification of events
in use cases, evaluate its accuracy, and compare it with accuracy of events identifi-
cation performed by humans.
78
7.1. Introduction 79
Method: To propose a method of automated identification of events, 160 use cases
from software projects have been analysed. This analysis resulted in identification
of 14 abstract types of events and two inference rules, which became a founda-
tion for the proposed method. The method was implemented as a plugin to UC
Workbench, a use-case editor. The effort required to identify events in use cases
(review speed) and completeness of events-identification (accuracy) performed by
the proposed method was evaluated with the use of a benchmark specification and
compared to the results for the ad hoc and HAZOP-based approaches. Moreover, to
evaluate the linguistic quality of the automatically identified events, an experiment
based on assumptions of the Turing-test was conducted.
Results: The conducted research resulted in the following findings:
• An automated method for identification of events was proposed. The ob-
served accuracy of this method was at the level of 0.8.
• The proposed automated method achieved higher accuracy than the man-
ual approaches: the maximum average accuracy for the ad hoc approach was
equal to 0.18 and for the HAZOP-based method it was equal to 0.26.
• The automated method, although not optimized for speed, was faster than
the manual ones. It was able to analyse, on average, 10.8 steps per minute,
while the participants of the experiments were able to analyse, on average, 2.5
steps per minute with the ad hoc approach, and 0.5 step per minute with the
HAZOP-based approach.
• The understandability of event descriptions generated by computer was not
worse than the understandability of descriptions written by humans.
Conclusion:
The proposed method of events identification is faster and more accurate than
the manual methods (the ad hoc method and the HAZOP-based one). The under-
standability of event descriptions generated by computer is not worse than the un-
derstandability of descriptions written by humans.
7.1 Introduction
Requirements highly influence different aspects of software development [67, 78, 86,
92]. One of the approaches to specifying functional requirements is the concept of
the use case. It was proposed by Ivar Jacobson [59] and further developed by Alistair
Cockburn [9, 31] and other authors [53]. Roughly speaking, a use case is a descrip-
tion of user-system interaction expressed in natural language. The focal element
of a use case is the main scenario consisting of a sequence of steps that describe
7.1. Introduction 80
the mentioned user-system interaction. While a step is being executed, some events
might happen. Descriptions of those events along with alternative scenarios are also
included in a use case.
The question of how to identify a (more or less) complete list of possible events
arises. Research shows that non-identified events (some call them “missing logic
and condition tests for error handling” [114]) are the most common cause of changes
in requirements which result in requirements volatility and creeping scope. The im-
portance of event-completeness has been recognized by many authors (although
they do not use this specific term), including Carson [28], Cox [35], Mar [83] and
Wiegers [120]. The approach they recommend is based on identification of events
performed by experts using brain storming or other “manual” techniques. There is
also a method called H4U [65]. It is a form of a HAZOP-based review, which aims
at identifying events in use cases. The main disadvantage of manual identification
of events is the time and effort required by such a process. Thus, a question arises
whether it is possible to identify events automatically and obtain good quality event
descriptions in the sense of event-completeness and their readability.
The above question is also interesting from the artificial intelligence point of
view. The following saying is attributed to Pablo Picasso: Computers are useless. They
can only give you answers [2]. Identifying of events that might happen during exe-
cution of a use-case step resembles a dialog in which a human is saying a simple
sentence (e.g. This evening I am going to the opera house. — that corresponds to a
use-case step), and a computer is asking a question (e.g. What if all the tickets are
sold? — that question includes the event all the tickets are sold). To ask such event-
related questions requires the understanding of a sentence, i.e. some knowledge is
necessary (in this example it is knowledge about opera and tickets), and the ability
to infer about events with use of that knowledge.
In this paper a method of automatic identification of events in use cases is pre-
sented. First, in Section 7.2, the model of use-case based requirements specification
is outlined. The central part of the method is the inference engine that was used
for the generation of event names. It is described in Section 7.3. The mechanism
for analysing a use-case step with a Natural Language Processing (NLP) tool is pre-
sented in Section 7.4. The events are generated using Natural Language Generation
(NLG) techniques — see Section 7.5. Empirical evaluation of the proposed approach
is discussed in Section 7.6. Related work is presented in Section 7.7.
7.2. Use-case based requirements specification 81
ID: UC1 Name: Edit a post Actor: Author Main scenario: 1. Author chooses a post. 2. Author chooses the edit option. 3. System prompts for entering the changes. 4. Author enters the changes to the post. 5. Author confirms the changes. 6. System stores the changes.Extensions:
4.A. Post with the entered title already exists. 2.A.1. System displays an error message
1
2
3
4
5
Figure 7.1: Example of a textual use case. Main elements: 1 - name, 2 - main scenario section, 3- extensions section, 4 - an event, 5 - an alternative scenario
7.2 Use-case based requirements specification
In this paper it is assumed that a functional requirements specification document
contains the following elements:
• Actors - descriptions of actors interacting with the system being built.
• Information Objects - presentation of domain objects the actors and the sys-
tem operate on. Every information object can have a set of different properties
assigned to it (the possible properties are described in Section 7.3).
• Use Cases - descriptions of user-system interactions in the form of scenarios
performed in order to achieve user’s goals. Every use case consists of a name
describing a user’s goal and a main scenario consisting of a sequence of steps.
Steps in the main scenario may be interrupted by events described within ex-
tensions. Additionally, alternative scenarios describe how given events should
be handled. An exemplary use case is presented in Figure 7.1.
Use cases have been used in both research and industry since their introduction
in 1985 [59]. Research shows that scenario-based notations (including use cases) are
the most popular way of describing customer requirements [92]. Two main factors
are responsible for their popularity: firstly, the clear focus on describing the goals
of the system; secondly, natural language which the requirements are expressed in.
The letter makes it easier for the customer and prospective end-users to understand
the requirements document, however, it makes it hard to process the specification
in an automatic way.
7.3. Inference engine for identification of events 82
7.3 Inference engine for identification of events
7.3.1 The problem
It is assumed that a functional requirements specification on the input contains de-
scriptions of actors, descriptions of information objects, and use case stubs, i.e. use
cases without their extension parts (see Fig. 7.1). To complete use case stubs with
event descriptions the following problem is discussed in this paper:
Events identification problem: Given a step from the main scenario of a use case
identify all the events that can happen when the step is executed. Only events which
can be detected by the system are of interest (that viewpoint is supported by Adolph
et al. [9] in their Detectable conditions pattern).
7.3.2 Abstract model of a use-case step
The process of event identification consists of four phases:
1. Preprocessing of the description of actors and information objects that creates
a dictionary of these elements (it resembles the creation of a symbol table by
a compiler during parsing).
2. Analysing a step using NLP techniques (a main scenario is analysed step by
step).
3. Inferring the possible events for a given step (that results in abstract descrip-
tions of events which are independent of natural language).
4. Generating event descriptions in natural language.
The overview of this process is presented in Figure 7.2. The most important part of
the presented method is inferring about the possible events.
Our approach to inference about events is based on a model of human-computer
interaction, which evolved from our study of available use cases and events associ-
ated with them. Our aim was to provide the simplest possible model which would
allow to inference about all the events we have found in the considered use cases.
The model is twofold and is illustrated in Figure 7.3.
Model of the real world (Fig. 7.3A). Some events are a consequence of a state in
which a real object exists (e.g. a store is empty). The model represents real world
objects (e.g. a book in a library, an article in a newspaper). What is interesting, for
the purpose of inferencing events, is that quite a simple tool proved to be powerful
enough: it is the set theory. A fragment of the real world we are interested in can
be viewed as individual objects (i.e. items) or collections of objects (i.e. sets). Fur-
thermore, items might be simple or they might be compound and consist of other
items. It is important to notice that some activities do not change the real world
7.3. Inference engine for identification of events 83
Preprocessing
Use-case stubs, Actors and
Information Objects descriptions
Analyst
Enters
Natural Language Processing
Abstract models of use-case steps
(Actors, Activities, Information Objects)
Events Inference Engine
Event types assigned to use-case steps
Natural Language Generation
Use-case stubs + Events
Dictionary of Actors and Information
Objects
1 3
2
Figure 7.2: An overview of use-case processing and events analysis: 1 - initial preprocessing of usecase stubs, actors and information objects descriptions (using NLP tools); 2 - generation of ab-stract classes of events (events inference engine); 3 - events names generation (using NLG tools)
.
- they just provide us with information about the world (state) as-is. On the other
hand, there are some activities which change the world (i.e. its state) and they lead
to a new state (as-to-be). In our view, without awareness of the last category of activ-
ities, identifying some events would be impossible. Moreover, the state of real world
objects can change over time, e.g. a credit card can expire on some day.
Model of IT system (Fig. 7.3B). Some events are associated with human-computer
communication or computer-computer interaction (e.g. an external service is not
available). This model assumes that all the interactions between the Actor and the
System happen through user interface (UI). The Actor can perform CRUD+SV op-
erations (Create, Read, Update, Delete, Select, Validate) on data (both internal and
external to the System). In the case of operations which can be harmful (e.g. deleting
an information object), the Actor might need to confirm his operation, so Confirm
7.3. Inference engine for identification of events 84
Actor System
CRUD+SV
Confirm
Display
ExternalData
InternalData
Item(compound)
Set Set
Change
Change
As-is As-to-be
A
B
Problem
UI
Item(compound)
Item(compound)
Item(simple) Item
(simple)
Item(simple) Item
(simple)
Figure 7.3: A - Model of real world, B - Model of IT system
has been added to our model. Moreover, the System may display data or may inform
the Actor about the occurrence of a problem - that is reflected in our model via the
Display operation. The behavior of the Actor and the System may depend on time,
i.e. performing some operations might take too much time and result in a timeout
or it might be too late for some operations to be completed. To some extent, the
model of IT system resembles Albrecht’s model [11] on which the function points
method [12] is based. Unfortunately, Albrecht’s model cannot be directly used for
inference about events. For instance, some events are strongly associated with the
delete operation which is not directly represented in the Albrecht’s model. There is
a similar problem with the confirm operation.
It is assumed that every step of a use case can be represented as a triplet: Actor,
Activity, Information Object [9, 109]. Those three notions, along with the notion of
Event, are described below.
7.3. Inference engine for identification of events 85
Actor
Actor represents an abstract entity which performs an action in a given step. There
are two types of actors: SYSTEM - represents the system being built; USER - rep-
resents one of the end-users of the system. In the example from Figure 7.1 Author
who is an actor in steps 1, 2, 4 and 5 has USER type. System which is an actor in
steps 3 and 6 is of the SYSTEM type. Every actor of type USER needs to be declared
by Analyst as such.
Activity
Activity is a process triggered by Actor and it manipulates Information Objects (In-
formation Objects are described below). To identify events that can appear when a
step is executed (i.e. activity described by it is performed), we have decided to iden-
tify abstract types of activities. The types of activities were derived from the model
of IT system presented in Figure 7.3.
Assumption #1:
Each step of a use case can be classified as one of the activity types presented in Table
7.1.
JUSTIFICATION: Part of the proposed types of activities are directly related to the CRUD+SV operations
from the model of IT system. Most of the CRUD+SV operations are simple from the point of view of
the presented model of real world and each of them is assigned single activity type. For the sake of
simplicity we name such activity types after operations corresponding to them. Here are activity types
corresponding to simple operations: READ, UPDATE, DELETE, SELECT, VALIDATE.
The Create operation is not simple: it exhibits susceptibility to different events depending on an
object it manipulates on. We have identified three activity types that can be assigned to this operation:
• ADD - when a new Information Object is added,
• ENTER - when data concerning part of Information Object is entered,
• LINK - when a connection between Information Objects is established.
Moreover, the confirm operation from the model of IT system is represented by CONFIRM activity
type and the display operation is represented by DISPLAY activity type.
o
Information Object
Information Object represents an object the actor operates on. The use case pre-
sented in Figure 7.1 operates on a post which is an example of Information Ob-
ject. Information Object is assigned a set of properties that are important from
the point of view of event identification. Table 7.2 presents a set of properties we
have found helpful when identifying events. The properties are divided into two
7.3. Inference engine for identification of events 86
Activity type Exemplary use-case steps
ADD Author adds new blog post.
ENTER Author enters title of the post.
LINK Author links the comment to the post.
READ Author browses available blog posts.
UPDATE Author changes the title of the post.
DELETE Author deletes the post.
SELECT Author selects a blog post from the available ones.
VALIDATE Author checks the blog post
CONFIRM Author confirms the changes.
DISPLAY System presents available blog posts.
Table 7.1: Identified types of activities of use-case steps.
.
sets called Activity-Free Properties (AFP) and Activity-Sensitive Properties (ASP). The
former are valid for Information Object no matter which activity is perform on this
object, e.g. Article always has the SET property, no matter what operation is per-
formed on it. These properties correspond to the elements of the models presented
in Figure 7.3: Set and Compound Items. Moreover, there might be some cardinal-
ity constraints imposed on the object, hence, two additional properties have been
added: AT_LEAST (the smallest number or elements) and NO_MORE_THAN (the
highest number of elements). Furthermore, some objects can be stored in an exter-
nal system. This kind of objects should be marked with the REMOTE property.
ASP properties depend on Information Object and an activity performed on it.
ASP properties are related to the time aspect of operations of the model of IT system.
Assume an invoice can be exported and also some other operation can be performed
on it. If only the export operation can take too long, what can lead to specific events,
then an information object Invoice will have the LONG_LASTING property only for
the export operation.
7.3.3 Events
Assumption #2:
Each event which can occur in a use case can be classified as one of the event types
presented in Table 7.3.
JUSTIFICATION: To identify event types one can use a HAZOP-like brainstorming session (see Sec.
6.3.1). As primary keywords one can choose activity types of Table 7.1. The CRUD+SV activities should
be considered in the following six variants:
{Internal data, External data} x {Simple, Compound, Set}
The secondary keywords can be the same as proposed in the HAZOP method (NO, MORE, etc. - see
Section 6.3.1). One can go through the list of primary keywords (i.e. all the variants of activity types),
7.3. Inference engine for identification of events 87
Property Description ExampleActivity-Free Properties (AFP)
SETrepresents objects which form aset (they have to be unique)
Article
COMPOUNDrepresents objects which consistof other objects
Credit Card (consists of holdername, number, expiration date,security code)
AT_LEASTrepresents objects which quantitymust be at least X
Participants (number of whichmust at least 10)
NO_MORE_THANrepresents objects which quantitymust be no more than X
Participants (number of whichcannot be larger than 20)
REMOTE represents external objects
Credit Card (in order to processinformation concerning CreditCard System needs to contact anexternal payment service)
Activity-Sensitive Properties (ASP)
LONG_LASTING for activity Xgiven activity X for this kindof objects can take a significantamount of time
Invoice (export of which may takesignificant amount of time)
TIMEOUTABLE for activity Xgiven activity X for this kind ofobjects is not possible after somedeadline
Admission (there might be adeadline for admissions to theuniversity)
Table 7.2: Identified properties of information objects together with their descriptions and exam-ples.
one by one, and for each one consider all the secondary keywords to identify possible event types. A
list of identified event types is maintained through the brainstorming session. After the brainstorming,
that list is analyzed, item by item, to:
• remove repeated or undetectable event types,
• clarify their descriptions,
• find examples of a proposed event type in the available real-life use cases.
The results of such a process are presented in Table 7.3. What is interesting, four HAZOP secondary
keywords (AS WELL AS, PART OF, BEFORE, AFTER) did not contribute to any of the identified event
types (they seem useless in this process).
o
The process of identification of events in use-case steps is based on inference
rules formulated after the analysis of real-life use cases. The rules are expressed in a
form similar to the propositional logic, i.e.:
pr emi seconcl usi on
By analysing available use cases we have discovered two inference rules called sim-
ple and compound. In order to keep the rules brief and clear the following symbols
are used: R for role (i.e. actor type), A for activity type, O for information object, P
for property of information object, E for event type.
Simple inference rule of events identification
The following notions are used by the simple reference rule:
7.3. Inference engine for identification of events 88
Type of event Description Examples of real life eventsWrongData provision of wrong/invalid data ISBN number is invalid
TooLittleLeftthere is not enough of informa-tion object
Bank account is empty
Incomplete not all data has been providedISBN of a book has not been pro-vided
AlreadyExist given data already existBook with the given ISBN alreadyexists
ConnectionProblemproblem with the connection witha remote system
Could not connect to the creditcard operator
TooMuchSelected too much data has been provided Too many subjects selected
NoObjectSelectedno data has been selected fromthe given options
No subject selected
AlreadySelectedgiven data has already been se-lected
The subject has already been se-lected
TooLittleSelected too little data has been provided Too few subjects selected1DoesNotExist given data does not exist Given article does not existLackOfConfirmation operation has not been confirmed Author does not confirm
NoDataDefinedwhen there is no information ob-ject of a given type defined
There are not subjects defined
WantsToInterruptwhen actor wants to interrupt anoperation
Author cancels export operation
TooLateForDoingThiswhen it is too late to performgiven operation
Deadline for adding new subjectshas passed
WantsToWithdrawwhen given operation is with-drawn
NOT FOUND in the analysed UCs
NoDataProvided when no data was provided NOT FOUND in the analysed UCs
LateConfirmationwhen confirmation wa performedtoo late
NOT FOUND in the analysed UCs
TooEarlyForDoingThiswhen it is to early to performgiven operation
NOT FOUND in the analysed UCs
TooMuchDataProvidedwhen too much data was pro-vided
NOT FOUND in the analysed UCs
Table 7.3: Identified event types together with their descriptions and examples.
• Observed is a finite set of axioms. Each axiom has the form of a quadruple
< R, A,P,E > and it denotes an observation that an activity of type A performed
by an actor of type R on an information object with property P can lead to an
occurrence of event of type E. The observed set of axioms we have identified
so far is presented in Table 7.4. Obviously, this set can be easily extended by
adding new axioms resulting from further research.
• AFP (stands for Activity-Free Property) is a function. Its domain is a finite set
of information objects described in a requirements specification document
(see Section 7.2). This function returns a finite set of properties of an infor-
mation object.
It is assumed that every use-case step, originally written in natural language, can be
represented by the following triplet: actor type, activity type and information object.
It means that each step is a simple sentence in the form of Subject-Verb-Object. This
assumption is supported e.g. by Cockburn’s guideline [31] which says that every step
7.3. Inference engine for identification of events 89
structure should be simple and consist of subject, verb and object. Therefore, a use-
case step is represented as S =< R, A,O >. Transformation of an original step to its
abstract representation < R, A,O > is done with the help of NLP tools (as described
in Section 7.4).
The simple inference rule assumes that properties of information objects de-
pend solely on information object, not on an activity. As events can appear but do
not have to, the rule uses the ♦ symbol from modal logic to denote this. The rule is
presented below:
S =< R, A,O >,P ∈ AF P (O),< R, A,P,E >∈Obser ved
S |=♦E(7.1)
An example of applying this rule is presented in Figure 7.4.
Compound inference rule of use-case events
Unfortunately, in some cases the simple inference rule is too weak. Some events
depend not only on an information object but also on an activity performed on it. To
cope with this, the compound inference rule has been introduced. This rule uses the
notion of activity-sensitive properties (ASP for short). It is a function whose domain
is a finite set of information objects (as in the case of AFP). The function returns a
finite set of pairs < P, A >, where P denotes a property of information object and A
denotes an activity type that has to take place in order for property P to exhibit itself.
The compound inference rule is presented below:
S =< R, A,O >,< P, A >∈ ASP (O),< R, A,P,E >∈Obser ved
S |=♦E(7.2)
To show how the above rule can be used let us extend the example from Figure 7.4.
Assume that the students have to submit their posts by a certain deadline. In this
case, knowing this, Analyst should classify post as an information object which is
TIMEOUTABLE for the write in activity (write in corresponds to the ENTER activity
type). As a result of parsing the declaration of information objects, the ASP function
is augmented with the following item:
< Post ,E N T ER,T I MEOU T ABLE > .
When it comes to parsing the step: Student writes in a post it is translated to the
following triplet
<U SER,E N T ER,Post >.
7.3. Inference engine for identification of events 90
Actor (R) Activity (A) Information Object Property (P) Event type (E)USER ENTER SET WrongDataUSER ENTER COMPOUND IncompleteUSER ENTER SET AlreadyExistsUSER ENTER REMOTE ConnectionProblemUSER ENTER AT_LEAST TooLittleLeft
SYSTEM DISPLAY SET NoDataDefinedUSER SELECT SET NoDataDefinedUSER SELECT SET AlreadySelectedUSER SELECT NO_MORE_THAN TooMuchSelectedUSER SELECT * NoObjectSelectedUSER DELETE * DoesNotExistUSER READ * DoesNotExistUSER LINK * IncompleteUSER SELECT AT_LEAST TooLittleSelectedUSER CONFIRM * LackOfConfirmation
SYSTEM
ADDENTERLINKREADUPDATEDELETESELECTVALIDATECONFIRMDISPLAY
LONG_LASTING WantsToInterrupt
USER
ADDENTERLINKREADUPDATEDELETESELECTVALIDATECONFIRMDISPLAY
TIMEOUTABLE TooLateForDoingThis
Table 7.4: Elements of the Observed set; the star means that the row is valid for any value of theproperties of the information object.
.
Since the Observed set contains the quadruple
<U SER,E N T ER,T I MEOU T ABLE ,TooLateFor Doi ng T hi s >
and ASP(Post) contains
< T I MEOU T ABLE ,E N T ER >
the following conclusion in inferred: the event TooLateForDoingThis can happen
when executing the step.
7.4. Natural Language Processing of Use Cases 91
1 …. 2. System prompts for entering a post. 3. Student writes in a post. 4. Student confirms the post. 5. ...
1 …. 2. System prompts for entering a post. 3. Student writes in a post. 4. Student confirms the post. 7. ...
1 …. 2. System prompts for entering a post. 3. Student writes in a post. 4. Student confirms the post. 5. ...
Natural Language Processing
R = USERA = ENTER
O = Post (with COMPOUND property)
Inference engine
Incomplete
StudentTeacher
Declarations of Actors
Exam
Post+COMPOUND
Declarations of Information Objects
Use cases
Observed set
Identified type of event:
Figure 7.4: An example of inferring events for step Student writes in a post.
7.4 Natural Language Processing of Use Cases
The presented inference engine works with the elements of abstract model of use-
case steps: actor type, activity type and information object. Before the engine can
be used every use-case needs to be analysed with NLP tools and the elements of the
mentioned model have to be identified.
7.4.1 Preprocessing
UC Workbench [88] is a tool used for use-case management. With this tool Analyst
can enter and manage his/her use case-based requirements specification. The en-
tered descriptions of actors, and information objects are automatically transformed
into a dictionary of actors and information objects.
7.4. Natural Language Processing of Use Cases 92
Sentence Splitter
Tokenizer
Use-case step
Sentences
Words
Grammatical relationsanalyzer
Lemmatizer
Grammatical Relations
Parts of speechanalyzer
Lemmas
Parts of speech
Actors' names Information objects'names
Model element reference marker
Actors references
Information objects
references
1
2
3
4
Activities lookup tool
Activitiessynonyms
Activity types
Figure 7.5: The chain of the used NLP tools: 1 - input provided by the analyst in a natural lan-guage (italic font), 2 - tool, 3 - output from one tool and input for another tool, 4 - output fromthe chain (bold font).
7.4.2 NLP
Additional plugins for UC Workbench performing natural language analysis were de-
veloped in order to analyse the entered use cases. These plugins are organized ac-
cording to the Chain of Responsibility design pattern [45]. It means that the text of a
use-case step goes through all of the tools defined in the chain. Moreover, an output
of one tool can be used as an input by another tool. Tools providing the following
operations are used in order to analyse a use-case step: sentence segmentation [24],
word tagging [4], part-of-speech tagging [3], lemmatization [64], dependency rela-
tions tagging [37]. The chain of the used NLP tools is presented in Figure 7.5. After
the NLP processing, each step is annotated with the elements of the step abstract
model: actor type, activity type, information object.
The presented NLP layer is language independent, meaning that this approach
could be used for any natural language providing that adequate tools are available
for the given language. In our research use cases were written in English and they
were analysed with the tools contained in the Stanford Parser kit [37] and Open NLP
project [1]. However, any other tools realising the above mentioned NLP operations
could be used, e.g. French parsers [48] and German [101] ones .
7.5. Generation of Event Descriptions 93
7.5 Generation of Event Descriptions
The inference engine described in Section 7.3 generates abstract types of events (de-
picted in Table 7.3). Those event types (e.g. TooLittleLeft) are like row material and
need further processing. Analyst needs to obtain a description of event which is ex-
pressed in natural language and which is easily understood by the potential readers.
Therefore, Natural Language Generation (NLG) was used as the final stage of events
identification.
For every abstract event type a template for generation of event description has
been proposed. The templates together with examples are presented in Table 7.5. A
template consists of constant string literals and items from the use-case step (avail-
able after NLP phase): subject, verb, object. Moreover, additional phrase features
are included in the templates.
For the task of NLG, a Java library called SimpleNLG [46], which provides a data-
to-text mechanism, was used. SimpleNLG offers an API which allows for the set-
ting of semantic input for a phrase and direct control over the realisation process of
phrase generation (by setting various values of phrase features, e.g. tense, voice).
Based on this, a prototype tool combining all three stages of event generation
(NLP analysis, inference engine, NLG) was implemented. The tool was built as an
additional plugin for UC Workbench. Section 7.6 presents empirical evaluation of
the accuracy of the prototype tool and the understandability of descriptions of the
identified events.
7.6 Empirical evaluation
As mentioned in Section 7.5, a prototype tool combining NLP, inference engine, and
NLG was implemented as part of the UC Workbench project. This prototype has
been evaluated from the three perspectives: accuracy of events identification (to-
gether with precision), speed of analysis of use case steps, and understandability of
the generated event descriptions.
7.6.1 Reference set of events
For the evaluation of the prototype we have used experimental data concerning ac-
curacy of events identification performed by humans [65]. Two experiments were
conducted, one with 18 students of Software Engineering programme at Poznan
University of Technology, second with 64 IT professionals. The designs of both ex-
periments were the same. The aim of the participants of both experiments was to
identify as many events as possible for the main scenarios from the benchmark use-
7.6. Empirical evaluation 94
Table 7.5: Information used for generation of descriptions of events. A template for every eventtype is presented, together with the exemplary data used for the generation of event description.
artic
leTo
oLat
eFor
Doi
ngTh
isen
ter
Aut
hor e
nter
s an
artic
leA
utho
rTo
o la
te fo
r thi
s ope
ratio
nTo
o la
te fo
r thi
s ope
ratio
n.
Syst
emSy
stem
impo
rts
invo
ice
data
.in
voic
eim
port
Wan
tsTo
Inte
rrup
tO
pera
tion
canc
eled
.O
pera
tion
canc
eled
disp
lay
No
artic
le h
as b
een
defin
ed.
NoD
ataD
efin
edN
o <o
bjec
t> h
as b
een
defin
edSy
stem
dis
play
s av
aila
ble
artic
les
artic
leSy
stem
<sub
ject
> <v
erb
[pre
sent
per
fect
tens
e, n
egat
ive]
>A
utho
r has
not
con
firm
ed.
Aut
hor
Lack
OfC
onfir
mat
ion
conf
irmop
erat
ion
Aut
hor c
onfir
ms t
he
oper
atio
n
Doe
sNot
Exis
tde
lete
artic
leG
iven
<ob
ject
> do
es n
ot e
xist
.A
utho
r del
etes
an
artic
leA
utho
rG
iven
arti
cle
does
not
ex
ist.
Stud
ent c
hoos
es
subj
ects
.To
o lit
tle <
obje
ct [p
lura
l]> <
verb
[pas
t ten
se, p
assi
ve
voic
e]>
by <
subj
ect>
Stud
ent
TooL
ittle
Sele
cted
Too
little
subj
ects
cho
sen
by S
tude
nt.
subj
ect
choo
se
Subj
ect h
as a
lread
y be
en
sele
cted
by
Stud
ent.
<obj
ect>
has
alre
ady
been
<ve
rb [p
ast p
artic
iple
]> b
y <s
ubje
ct>
subj
ect
Stud
ent s
elec
ts a
su
bjec
t.St
uden
tse
lect
Alre
adyS
elec
ted
No
<obj
ect>
<ve
rb [p
ast t
ense
, pas
sive
voi
ce]>
by
<sub
ject
>ch
oose
Stud
ent
Stud
ent c
hoos
es a
bo
ok.
NoO
bjec
tSel
ecte
dbo
okN
o bo
ok c
hose
n by
St
uden
t.
Stud
ent s
elec
ts b
ooks
.se
lect
Stud
ent
Too
man
y <o
bjec
t> <
verb
[pas
t ten
se, p
assi
ve v
oice
]>
by <
subj
ect>
book
Too
man
y bo
oks s
elec
ted
by S
tude
nt.
TooM
uchS
elec
ted
cred
it ca
rdC
usto
mer
Con
nect
ionP
robl
emC
onne
ctio
n pr
oble
m o
ccur
red
ente
rC
usto
mer
ent
ers c
redi
t ca
rd d
etai
ls.
Con
nect
ion
prob
lem
oc
curr
ed.
artic
leA
rticl
e al
read
y ex
ists
.A
lread
yExi
st<o
bjec
t> a
lread
y ex
ists
.w
rite
inA
utho
r writ
es in
an
artic
leA
utho
r
artic
leA
utho
r writ
es in
an
artic
leA
utho
r wro
te in
in
com
plet
e da
ta o
f arti
cle.
writ
e in
<sub
ject
> <v
erb
[ pas
t ten
se, a
ctiv
e vo
ice]
>, in
com
plet
e da
ta o
f <ob
ject
[sin
gula
r]>
Aut
hor
Inco
mpl
ete
Cus
tom
er e
nter
s nu
mbe
r of b
ooks
C
usto
mer
ente
rN
ot e
noug
h bo
oks l
eft.
book
Not
eno
ugh
<obj
ect [
plur
al]>
left
TooL
ittle
Left
Gen
erat
ed d
escr
iptio
n
Cus
tom
er e
nter
ed w
rong
da
ta o
f cre
dit c
ard.
<sub
ject
> <v
erb
[ pas
t ten
se]>
wro
ng d
ata
of <
obje
ct>
Even
t tem
plat
e
cred
it ca
rd
Obj
ect
Verb
ente
r
Subj
ect
Cus
tom
er
Use
-cas
e st
ep
Cus
tom
er e
nter
s cre
dit
card
det
ails
.
Even
t Typ
e
Wro
ngD
ata
7.6. Empirical evaluation 95
case-based specification [14]. The identified events in the experiments were anal-
ysed by two experts and a set of sensible events for steps from the benchmark spec-
ification was distinguished. This set was used as a reference set of events for the
evaluation of the prototype tool.
7.6.2 Sensible events
Use-case steps are written in a natural language, therefore, their interpretation can
differ from reader to reader. From the perspective of use-case events this means that
different people might identify different events for a given use-case step. Some of
the identified events, although being correct in terms of language correctness, might
not make sense in the context of the use case. For example for the step number 3
from Figure 7.1 the following event could be identified: Author answers a phone call.
This event is a correct event from language perspective, however, it does not make
sense for the given step. The steps which make sense in the context of a given use-
case step are desirable and we will call them sensible.
7.6.3 Measures
The following measures were used in order to evaluate the prototype:
• Accuracy (of events identification) - the proportion between the number of
generated sensible events descriptions and the total number of events from
the reference set of events1. It was measured using the following equation:
Accur ac y =∑n
s=1 #event s(s)∑ns=1 #Event s(s)
(7.3)
Where:
– n is the total number of steps in the benchmark specification;
– #event s(s) is the number of distinct sensible events identified by the
prototype tool for the step s;
– #Event s(s) is the number of distinct sensible events identified by all par-
ticipants in the experiments for the step s.
• Precision - the proportion of the identified events that were sensible. It mea-
sured using the following equation:
Pr eci si on = T P
T P +F P(7.4)
The symbols TP, FP are defined by the confusion matrix presented in Figure
7.6.
1This definition defers from the definition used in the domain of information retrieval. However,we wanted to be consistent with the definitions and terminology used in Section 6.4.2
7.6. Empirical evaluation 96
• Speed - the time used for the analysis of the given number of steps. It was
measured using the following equation:
Speed(p) = #steps
T[steps/mi nute] (7.5)
Where:
– #steps is the total number of steps analysed by the prototype.
– T is the total time spend on analysing the steps by the prototype.
Expert answery n
Toolanswer
Y TP FP
N FN TN
Figure 7.6: Confusion matrix. T (true) - tool answer that is consistent with expert decision. F(false) - tool answer that is inconsistent with expert decision. P (positive) - tool positive answer(event was identified). N (negative) - tool negative answer (event was not identified).
7.6.4 Procedure
The prototype tool was run for the whole benchmark specification [14]. For every
use-case step from the specification the time of the analysis by the tool was mea-
sured separately 2. After the analysis had been finished the generated events de-
scriptions were evaluated by an expert. The similar procedure was used as with the
experiments from Section 6.4: the generated events were assigned to different ab-
stract classes of events. Based on this analysis, the identified events were divided
into sensible and insensible ones. Moreover the measures from the confusion ma-
trix (presented in Figure 7.6) were calculated.
7.6.5 Results
Table 7.7 holds the results for the prototype and for two methods used for events
identification in the experiments described in Chapter 6. H4U was a HAZOP-like
method described in Section 6.3.2. Ad hoc was a simple approach for events iden-
tification in which the participants of the experiments did not use any particular
method for events identification. Precision was not calculated for the H4U and ad
hoc approaches, hence, these results are not presented. The initial accuracy of the
prototype was equal to 0.73, however, after the analysis of the results it was found
2The analysis was performed 10 times, the average times are presented. The following hardwarewas used to perform the measurements: MacBook Air, CPU: 1.8 GHz Intel Core i7, RAM: 4GB 1333MHz DDR3, HDD: 256 GB SSD.
7.6. Empirical evaluation 97
that the initial Observed set used in the inference engine (see Section 7.3) could be
extended by adding new elements to it. After adding four new elements to the Ob-
served set (these elements are presented in Table 7.6) the accuracy rose to 0.89.
Prototype H4UMax
H4UAvg
Ad hocMax
Ad hocAvg
Accuracy (0.73) 0.89 0.44 0.26 0.28 0.18Precision 0.93 NA NA NA NASpeed 10.8 2.32 0.85 2.65 2.58
Figure 7.7: Results for the prototype and for H4U and ad hoc approaches.
Actor (R) Activity (A) Information Object Property (P) Event type (E)SYSTEM FETCH REMOTE ConnectionProblem
USER UPDATE SET WrongDataUSER ENTER COMPOUND IncompleteUSER DELETE AT_LEAST TooLittleLeft
Table 7.6: Elements added to the Observed set based on analysis of the results of the experiment.
7.6.6 Interpretation
The presented results for the implemented prototype and two manual approaches
to events identification show that the accuracy and speed of events identification
can be highly improved by using an automated tool.
The tool was not able to generate all of the possible sensible events because
some steps did not contain direct references to the information objects (instead
words like information or data were used). Another reason was related to the events
related to the domain of the benchmark specification (university admission system).
These events could be identified only by the domain experts. For example, the step
Student links subjects to the majors can be interrupted by the Too little humanities
subjects were chosen event which is a business rule known only by a domain special-
ist.
Higher accuracy of the tool does not mean low precision. High level of preci-
sion (0.93) reached by the tool means that small number of the generated events
names were not sensible. This shows that the tool does not produce large overhead
which, in other case, might lead to additional work of rejecting unnecessary events
by Analyst.
The speed is another advantage of the tool. It was able to generate events for the
whole benchmark specification (consisting of over 150 use-case steps) in time under
20 minutes, while only five participants of the experiments were able to analyse the
whole specification in 60 minutes time.
7.6. Empirical evaluation 98
7.6.7 Turing test
Every generated event description was analysed by an expert in order to evaluate if
the given event name made sense. However, a question arises if the generated event
descriptions are easy to understand by people. In order to evaluate the understand-
ability of the generated event descriptions an experiment, based on the assumptions
of Turing test [117], was conducted.
Experiment design
This section presents the design of the conducted experiment.
Participants The experiment was conducted with 12 students of Software Engineer-
ing programme at Poznan University of Technology. The students were familiar with
the concept of use cases and use-case events. All of the classes of the Software En-
gineering programme are conducted in English, hence it was assumed that the stu-
dents can understand text written in English.
Experimental material The material used in the experiment consisted of two parts:
form and questionnaire. The form held set of 20 pairs of descriptions of use-case
events (see Appendix A). Every pair consisted of an event description generated by
the tool and event description identified by a human in the mentioned experiments,
in case of the latter the best (from the language point of view) available description
was chosen (based on the expert evaluation). Event descriptions from every pair
represented the same abstract event. Order of the events descriptions in the pairs
was random. Table 7.8 presents an example of a pair of event descriptions used in
the experiment. The task of each subject was to decide which of the event descrip-
tions from the given pair is better from the linguistic point of view. The participants
could choose one of the following options:
• event A is much better than event B
• event A is slightly better than event B
• it is hard to say which event is better
• event B is slightly better than event A
• event B is much better than event B
The questionnaire was prepared as a measurement instrument for controlling
possible influence factors such as experience with use cases, understandability of
the events, etc.
Procedure Before the main experiment a preliminary experiment (with 3 partic-
ipants) was conducted in order to evaluate the proposed experimental material and
the time required to perform the task. Some of the comments from the participants
7.6. Empirical evaluation 99
Use-case step:Candidate chooses a major thathe/she wants to pay for.
Event A: No major was chosen by Candidate.Event B: Candidate didn’t choose any major.
Figure 7.8: Example of pair of events used in the Turing test experiment.
of this preliminary experiment were incorporated into the final version of the mate-
rials. The results of this preliminary experiment were not taken into account in the
final results. The main experiment started with a 5-minutes introduction to the ex-
periment. The experimenter demonstrated the form and the task. The participants
were not informed that half of the event descriptions were generated by a tool. The
participants were asked to read the form and perform the task. They were also asked
to note down the start and finish times. The participants were given 30 minutes to
finish the task. After finishing the task the participants were asked to fill out the
questionnaire. At the end all the materials were collected by the experimenter. In
total the experiment was finished in 35 minutes.
Results In the experiment 5-points scale was used in order to evaluate pairs of
events. However, in order to evaluate the results a 3-points scale was used. For every
pair of events it was counted:
• how many participants marked that the event description generated by the
tools is much better or slightly better than the event identified by human;
• how many participant could not say the difference between event descrip-
tions;
• how many participants marked the event identified by human as much or
slightly better than the automatically generated event description.
The results for every pair of event descriptions are presented in Figure 7.9. Table 7.7
presents descriptive statistics for the aggregated numbers of participants choosing
one type of events (either the generated ones by the tool or the identified ones by
humans).
Average Standard deviationTool 6.3 2.85Humans 4.15 2.66
Table 7.7: Statistics of the results of the Turing test experiment.
Interpretation From the presented results it can be observed that in 12 cases
more participants considered the generated event descriptions better than those
provided by humans. In 7 cases more participants preferred the events identified
7.7. Related work 100
Figure 7.9: Results of the Turing test experiment. Event pairs are sorted according to the numberof participants preferring generated event descriptions (ascending order).
.
by human. In 1 case (event pair id = 5) the numbers were equal. On average about 6
participants voted for the event descriptions generated by the tool and only about 4
participants voted for the events identified by humans. It can be concluded that the
implemented tool was able to generate event descriptions which might be used in a
real life use-case based specification.
7.7 Related work
A number of authors, including Cox [35], Mar [83], Wiegers [120], advocate focusing
on specification of events and exceptional scenarios. Alexander [15] underlines that
“exception cases can often outnumber normal cases and can drive project costs.”.
Experiments conducted by him show that using scenarios helps in identification
of exceptions and events, however, he does not propose any particular method for
events identification. Cox et al. [35] propose the following question in the check-
list for improving requirements quality: “Alternatives and exceptions should make
sense and should be complete. Are they? “. Nevertheless, they do not present any
technique which could help in achieving completeness of events. Similarly, Wiegers
[120] suggests the following question in his inspection checklist “Are all known ex-
ception conditions documented?”.
Letier et al. [80] presented a technique for discovering unforeseen scenarios
7.8. Conclusions 101
which might have harmful consequences if they are not handled by a system. Their
method is focused on looking for implied scenarios and can help in the elabora-
tion of additional system requirements. The presented cases studies show that the
method can give good results, however, it cannot be used with use cases written in
a natural language as it assumes Message Sequence Charts are used as a scenario-
based specification language. Moreover, the prosed technique is manual and re-
quires vast domain knowledge.
Lamsweerde et al. [118] propose an approach for discovering and handling ob-
stacles in requirements. They claim that very often requirements specifications pres-
ent ideal situations and omit abnormal situations which, if not handled properly,
may lead to system failures or unexpected behaviors. Lamsweerde presents tech-
niques which work on requirements defined as users’ goals expressed in a formal
language called KAOS[79]. The proposed techniques allow for systematic identifica-
tion of obstacles from goal specifications. Due to the fact that the method is suited
for KAOS-based requirements it allows for inference about possible obstacles. How-
ever, this strict level of formality makes the requirements hard to write and even
harder to ready no IT-laymans.
Makino et al. [82] propose a method for generation of exceptional scenarios.
His approach is based on the analysis of differences between slightly different nor-
mal (positive) scenarios. This method assumes that similar normal scenarios should
share similar abnormal scenarios. Although this method can help in automatic anal-
ysis of exceptional scenarios, it works only for requirements written with a scenario
language named SLAF [52], which assumes limited vocabulary and grammar.
Sutcliffe et al. [115] propose an approach based on automatic generation of dif-
ferent scenarios from defined use cases. Those scenarios might include exceptional
situations which the system should handle. The automatic generation of exceptional
cases may look similar to the proposed approach in this paper, however, the ap-
proach proposed by Sutcliffe et al. assumes that the possible abnormal pathways
are selected manually. These pathways are used for the generation of possible sce-
narios. This shows that this method is not able to generate the possible abnormal
events in an automatic way.
7.8 Conclusions
Event completeness is one of the dimensions of the quality of software require-
ments. Ad hoc approach to events identification provides poor results (see Chap-
ter 6.4). The proposed method, called H4U, aimed at events identification was able
to improve effectiveness of manual events identification, however, the results were
7.8. Conclusions 102
still not satisfactionary. Therefore, an automated method for event identification
has been proposed in this paper. The method has been evaluated and compared
to the H4U method [65]. The following conclusions can be drawn from the results
presented in this paper:
• The proposed method of automated identification of events can achieve higher
accuracy than manual approaches. The final accuracy was at the level of 89%,
compared to the average accuracy of the H4U method of 26%. The proposed
method also achieved high precision (0.93).
• The automated method is better than the H4U method in terms of speed. On
average it can analyse around 10 steps per minute, while people using H4U
are able to analyse about 1 step per minute.
• The event names generated by the implemented tool could not be differen-
tiated from the event names proposed by humans. This was investigated in
an experiment based on the assumptions of a Turing test, where participants
(students of Software Engineering) had to compare pairs of the proposed events.
This means that the proposed method could potentially be used in industrial
settings.
Chapter 8
Conclusions
The aim of the thesis was to investigate the problem of identification of events in
use cases. The achieved results confirmed the author’s surmise that the process of
identification of use-case events can be automated and can achieve effectiveness
which is not worse than manual approaches (in fact it is even higher).
In particular, the following conclusions can be drawn from the conducted re-
search:
Conclusion#1: 9 bad smells resulting from violations of Cockburn’s guidelines can be
detected using simple NLP-based methods.
As the first step to improve the quality of use-case-based specification a set of
simple NLP-based methods was proposed. Those methods focus on detection of so
called bad smells, i.e. violations of good practices proposed by Adolph et al. [9] and
Cockburn [31]. The methods included: one specification-level method, four use-
case level methods and four step-level methods.
The methods were evaluated using 40 use cases. Different kinds of bad smells
could be found in those use-cases. The preliminary case study showed that those
bad smells could be detected by the proposed methods. Examples of how those
methods work are presented in Section 4.5.
Conclusion#2: A HAZOP-based method of manual identification of events in use
cases is more effective than an ad hoc approach.
Events in use cases can be treated as deviations in real-time systems. Based
on this assumption, the HAZOP-based method (called H4U) was proposed and de-
scribed in Chapter 6. H4U proposes new primary keywords based on analysis of
real-life use cases, while it preserves the original secondary keywords from the HA-
ZOP method.
The proposed method was evaluated in two experiments, the first one with 18
103
Conclusions 104
students, the second one with 64 IT professionals. As presented in Section 6.4, H4U
performed better than the ad hoc approach in terms of accuracy of events identi-
fication. Students using the H4U method were able to identify 26%, and using the
ad hoc approach 18% of the sensible events. In the case of IT professionals, they
were able to identify 17% of sensible events with the use of the H4U method and
14% with the ad hoc approach. The differences between H4U and ad hoc were sta-
tistically significant and showed that H4U helps in achieving higher effectiveness of
events identification than ad hoc approach.
Moreover, these experiments showed that event identification is not an easy task
and it is not easy to achieve a high level of completeness of events in use cases.
Conclusion#3: Manual identification of events in use cases is not highly effective.
The experiments described in Chapter 6 showed that the maximum accuracy of
events identification was equal to 0.45 and the maximal average accuracy was equal
to 0.26. These results reveal the problem of the low effeciveness of manual identi-
fication of events. Even using a specialised method, based on HAZOP, participants
of the experiment were not able to achieve more than 0.5 accuracy. This presents a
vast space for new solutions which could increase the effeciveness of events identi-
fication in use cases.
Conclusion#4: The automated method of events identification described in Section
7.3 is more effective than the examined manual approaches (ad hoc and H4U).
A method for automatic events identification was proposed in Chapter 7. The
method consists of four phases: preprocessing of information about actors and in-
formation objects; analysing a step using NLP techniques; inferring about the possi-
ble events using the proposed inference rules; generating event descriptions in nat-
ural language using NLG techniques.
In order to evaluate the proposed method, a prototype tool was implemented.
The tool was based on UC Workbench [88] and it used Stanford Parser [37] and
OpenNLP [1] tools for NLP analysis and SimpleNLG [46] for the task of natural lan-
guage generation.
The method was evaluated in two experiments. In the first experiment, the accu-
racy and speed of the automated method was compared to the H4U and ad hoc ap-
proach. The automated method was able to achieve accuracy equal to 0.89 while the
maximum average accuracy for H4U was equal to 0.26 and for ad hoc it was equal to
0.18. Another important measure is precision of identification of events defined in
Section 7.6. For the benchmark requirements specification the achieved precision
was 0.93. As expected, the prototype tool, although not optimised for speed, was
Conclusions 105
considerably faster than the manual approaches. The tool was able to analyse over
10 steps in a minute while maximum average speed for H4U was equal to 0.85 and
for ad hoc was equal to 2.58.
In the second experiment, the understandability of the generated event descrip-
tions was evaluated. That experiment was a kind of Turing test [117]. Twelve stu-
dents were asked to compare pairs of events: one proposed by a human and one
generated by the prototype tool. The task of the participants was to decide which of
the two event descriptions was better from the language point of view. The experi-
ment showed that event descriptions generated by the tool are not worse than those
written by humans.
Future work In this thesis an automated method of events identification has been
proposed (a generator of event descriptions). A question arises whether it is possible
to further develop the method to perform an automatic review of events identified
by a human (a checker of completeness of identified events). The main and most
difficult problem is automatic discovery of equivalences between two sentences ex-
pressed in natural language.
From the practical point of view, it also would be important to extend the pro-
totype tool to allow analysis of use cases written in other natural languages (e.g.
Polish).
Appendix A
Material used in Turing test
experiment
106
Participant ID: ........................
Start time: ............................ ! ! ! !
The task:Evaluate (from the language point of view) the given events for the use-case steps. Mark every pair of events with one of the following options:
A >> B - event A is much better than event BA > B - event A is slightly better than event BA ? B - it’s hard to say which event is betterB > A - event B is slightly better than event AB >> A - event B is much better than event B
1.
Step: 2. System presents a list of majors for which admission is available.
Event A: 2.A. No major is defined in the System.
Event B: 2.A. There are no majors for which admission is available
Evaluation: A >> B A > B A ? B B > A B >> A
2.
Step: 3. Candidate enters text of the post.
Event A: 3.A. Incomplete data of post was entered by Candidate.
Event B: 3.A. Candidate did not fill the text of post.
Evaluation: A >> B A > B A ? B B > A B >> A
3.
Step: 3. Administrator provides basic information concerning the admission.
Event A: 3.A. Administrator provides not enough data of admission.
Event B: 3.A. Incomplete data of admission was provided by Administrator.
Evaluation: A >> B A > B A ? B B > A B >> A
4.
Step: 3. System displays the content of the post.
Event A: 3.A. No post is defined in the System.
Event B: 3.A. No post available.
Evaluation: A >> B A > B A ? B B > A B >> A
5.
Step: 5. Administrator links languages available on the major.
Event A: 5.A. No languages chosen by Administrator.
Event B: 5.A. Language was not selected.
Evaluation: A >> B A > B A ? B B > A B >> A
6.
Step: 4. Candidate fills registration data and submits the form.
Event A: 4.A. Candidate provided wrong data.
Event B: 4.A. Wrong data of registration data was filled by Candidate.
Evaluation: A >> B A > B A ? B B > A B >> A
7.
Step: 2. System imports payment entries from the bank.
Event A: 2.A. The bank system is not available
Event B: 2.A. Connection problem occurred.
Evaluation: A >> B A > B A ? B B > A B >> A
8.
Step: 1. Selecting committee selects a Candidate who was qualified and selects an option to verify his data.
Event A: 1.A. No qualified candidate is defined in the System.
Event B: 1.A. No candidates are qualified.
Evaluation: A >> B A > B A ? B B > A B >> A
9.
Step: 3. Candidate chooses a major that he/she wants to pay for.
Event A: 3.A. No major was chosen by Candidate
Event B: 3.A. Candidate didn’t choose any major
Evaluation: A >> B A > B A ? B B > A B >> A
10.
Step: 6. Candidate provides credit card data and confirms payment.
Event A: 6.A. Candidate does not have founds on account.
Event B: 6.A. Not enough money is left on credit card.
Evaluation: A >> B A > B A ? B B > A B >> A
11.
Step: 7. Selection committee confirms payment cancellation.
Event A: 7.A. Lack of confirmation from Selection committee.
Event B: 7.A. Cancelation is not confirmed by committee.
Evaluation: A >> B A > B A ? B B > A B >> A
12.
Step: 3. Candidate provides question.
Event A: 3.A. Incomplete data of question was provided by Candidate
Event B: 3.A. Empty string was provided by candidate
Evaluation: A >> B A > B A ? B B > A B >> A
13.
Step: 6. Candidate provides credit card data and confirms payment.
Event A: 6.A. Credit card lost its validity.
Event B: 6.A. Credit card is expired.
Evaluation: A >> B A > B A ? B B > A B >> A
14.
Step: 3. Administrator confirms removing.
Event A: 3.A. Administrator does not confirm removal.
Event B: 3.A. Lack of confirmation from Administrator.
Evaluation: A >> B A > B A ? B B > A B >> A
15.
Step: 5. System presents the list of current issues.
Event A: 5.A. No issue is defined in the System.
Event B: 5.A. The list of current issues is empty.
Evaluation: A >> B A > B A ? B B > A B >> A
16.
Step: 6. Selecting committee analyses application as an issue.
Event A: 6.A. Too late for performing analyse operation.
Event B: 6.A. Selecting committee provides the analysis later than it was defined in the process.
Evaluation: A >> B A > B A ? B B > A B >> A
17.
Step: 6. Candidate provides credit card data and confirms payment.
Event A: 6.A. Provided data is incorrect.
Event B: 6.A. Wrong data of credit card was provided by Candidate.
Evaluation: A >> B A > B A ? B B > A B >> A
18.
Step: 3. Administrator provides information and confirms creation of the subject
Event A: 3.A. Administrator duplicated existing subject.
Event B: 3.A. Subjects already exist in the System.
Evaluation: A >> B A > B A ? B B > A B >> A
19.
Step: 3. Administrator defines an algorithm in the algorithm editor.
Event A: 3.A. Algorithm already exists in the System.
Event B: 3.A. The same algorithm exists
Evaluation: A >> B A > B A ? B B > A B >> A
20.
Step: 2. Selecting committee chooses one of the admissions from the available in the system.
Event A: 2.A. No admission is defined in the System.
Event B: 2.A. There are no admissions available in the system.
Evaluation: A >> B A > B A ? B B > A B >> A
Finish time: .......................
Bibliography
[1] OpenNLP homepage http://opennlp.apache.org, last checked: 4th June 2013.
[2] Pablo Picasso’s quotehttp://www.brainyquote.com/quotes/quotes/p/pablopicas102018.html, lastchecked: 1st August 2013.
[3] Stanford Log-linear Part-Of-Speech Taggerhttp://nlp.stanford.edu/software/tagger.shtml, last checked: 4th June 2013.
[4] Stanford Tokenizer http://nlp.stanford.edu/software/tokenizer.shtml, lastchecked: 4th June 2013.
[5] The Stanford Parser: A statistical parserhttp://nlp.stanford.edu/software/lex-parser.shtml, last checked: 4th June2013.
[6] Use Cases Database (UCDB). http://www.ucdb.cs.put.poznan.pl, last checked: 31stAugust 2013.
[7] Chaos: A recipe for success, 2004. Technical report, Standish Group, 2004.
[8] John Aberdeen, John Burger, David Day, Lynette Hirschman, Patricia Robinson, and MarcVilain. Mitre: description of the alembic system used for muc-6. In Proceedings of the 6thconference on Message understanding, MUC6 ’95, pages 141–155, Stroudsburg, PA, USA, 1995.Association for Computational Linguistics.
[9] S. Adolph, P. Bramble, A. Cockburn, and A. Pols. Patterns for Effective Use Cases.Addison-Wesley, 2002.
[10] Advances in Neural Information Processing Systems 15. Fast Exact Inference with a FactoredModel for Natural Language Parsing, 2003.
[11] A.J. Albrecht. Measuring application development productivity. Proceedings of the JointSHARE/GUIDE/IBM Application Development Symposium, pages 83–92, October 1979.
[12] A.J. Albrecht. AD/M Productivity Measurement and Estimate Validation. IBM CorporateInformation Systems and Administration, 1984.
[13] B. Alchimowicz, J. Jurkiewicz, M. Ochodek, and J. Nawrocki. Building benchmarks for usecases. Computing and Informatics, 29(1):27–44, 2010.
[14] Bartosz Alchimowicz, Jakub Jurkiewicz, Jerzy Nawrocki, and Mirosław Ochodek. Towardsuse-cases benchmark. In Software Engineering Techniques, volume 4980 of Lecture Notest inComputer Science, pages 20–33. Springer Verlag, 2011.
[15] I. Alexander. Scenario-driven search finds more exceptions. In Proceedings of the 11thInternational Workshop on Database and Expert Systems Applications, DEXA ’00, pages 991–,Washington, DC, USA, 2000. IEEE Computer Society.
[16] I. Alexander. Misuse cases: Use cases with hostile intent. IEEE Software, 20(1):58–66, 2003.
112
113
[17] Karen Allenby and Tim Kelly. Deriving safety requirements using scenarios. In in the 5th IEEEInternational Symposium on Requirements Engineering(RE’01), pages 228–235. Society Press,2001.
[18] Bente Anda and Dag I. K. Sj. Towards an inspection technique for use case models. InProceedings of the 14th international conference on Software engineering and knowledgeengineering, pages 127–134, 2002.
[19] David Baccarini, Geoff Salm, and Peter E. D. Love. Management of risks in informationtechnology projects. Industrial Management and Data Systems, 104(4):286–295, 2004.
[20] Adam L. Berger, Vincent J. Della Pietra, and Stephen A. Della Pietra. A maximum entropyapproach to natural language processing. Comput. Linguist., 22(1):39–71, March 1996.
[21] B. Bernardez, A. Duran, and M. Genero. Empirical Evaluation and Review of a Metrics-BasedApproach for Use Case Verification. Journal of Research and Practice in InformationTechnology, 36(4):247–258, 2004.
[22] Stefan Biffl and Michael Halling. Investigating the defect detection effectiveness and costbenefit of nominal inspection teams. IEEE Transactions on Software Engineering.,29(5):385–397, May 2003.
[23] D. Bjørner and CB. Jones, editors. The Vienna Development Method: The Meta-Language,volume 61 of Lecture Notes in Computer Science. Springer–Verlag, 1978.
[24] Leo Breiman, Jerome Friedman, Charles J. Stone, and R. A. Olshen. Classification andRegression Trees. Chapman and Hall/CRC, 1 edition, January 1984.
[25] John M. Carroll. Making Use: Scenario-Based Design of Human-Computer Interactions. MITPress, Cambridge, MA, USA, 2000.
[26] John M. Carroll, Mary Beth Rosson, George Chin, Jr., and Jürgen Koenemann. Requirementsdevelopment in scenario-based design. IEEE Trans. Softw. Eng., 24(12):1156–1170, December1998.
[27] Ronald S. Carson. A set theory model for anomaly handling in system requirements analysis.In Proceedings of NCOSE, 1995.
[28] Ronald S. Carson. Requirements completeness: A deterministic approach. In Proceedings of 8thAnnual International INCOSE Symposium, 1998.
[29] Chemical Industries Association. A Guide to Hazard and Operability Studies, 1977.
[30] Alicja Ciemniewska, Jakub Jurkiewicz, Jerzy Nawrocki, and Łukasz Olek. Supporting use-casereviews. In 10th International Conference on Business Information Systems, volume 4439 ofLNCS, pages 424–437. Springer Verlag, 2007.
[31] A. Cockburn. Writing Effective Use Cases. Addison-Wesley Boston, 2001.
[32] J. Cohen. Statistical power analysis. Current Directions in Psychological Science, 1(3):98–101,1992.
[33] Mike Cohn. User Stories Applied: For Agile Software Development. Addison Wesley LongmanPublishing Co., Inc., Redwood City, CA, USA, 2004.
[34] Larry L. Constantine and Lucy A. D. Lockwood. Software for use: a practical guide to the modelsand methods of usage-centered design. ACM Press/Addison-Wesley Publishing Co., New York,NY, USA, 1999.
[35] Karl Cox, Aybüke Aurum, and D. Ross Jeffery. An experiment in inspecting the quality of usecase descriptions. Journal of Research and Practice in Information Technology, 36(4), 2004.
[36] Marie-Catherine de Marnee and Christopher D. Manning. Stanford typed dependenciesmanual.
114
[37] Marie-Catherine de Marneffe, Bill MacCartney, and Christopher D. Manning. Generating typeddependency parses from phrase structure parses. In In LREC 2006, 2006.
[38] Marie-Catherine de Marneffe and Ch. D. Manning. The Stanford typed dependenciesrepresentation. In Proceedings of the Coling 2008 Workshop on Cross-Framework andCross-Domain Parser Evaluation, pages 1–8, 2008.
[39] C. Denger, B. Paech, and B. Freimut. Achieving high quality of use-case-based requirements.Informatik-Forschung und Entwicklung, 20(1):11–23, 2005.
[40] Steven J. DeRose. Grammatical category disambiguation by statistical optimization. Comput.Linguist., 14(1):31–39, January 1988.
[41] S. Diev. Use cases modeling and software estimation: Applying Use Case Points. ACM SIGSOFTSoftware Engineering Notes, 31(6):1–4, 2006.
[42] Chuan Duan, Paula Laurent, Jane Cleland-Huang, and Charles Kwiatkowski. Towardsautomated requirements prioritization and triage. Requir. Eng., 14(2):73–89, 2009.
[43] T. Fawcett. Roc graphs: Notes and practical considerations for data mining researchers. 2003.
[44] Martin Fowler, Kent Beck, John Brant, William Opdyke, and Don Roberts. Refactoring:Improving the Design of Existing Code. Addison-Wesley, 1999.
[45] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design patterns: elements ofreusable object-oriented software. Addison-Wesley Longman Publishing Co., Inc., Boston, MA,USA, 1995.
[46] Albert Gatt and Ehud Reiter. Simplenlg: a realisation engine for practical applications. InProceedings of the 12th European Workshop on Natural Language Generation, ENLG ’09, pages90–93, Stroudsburg, PA, USA, 2009. Association for Computational Linguistics.
[47] Martin Glinz. Improving the quality of requirements with scenarios, 2000.
[48] Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning.Multiword expression identification with tree substitution grammars: a parsing tour de forcewith french. In Proceedings of the Conference on Empirical Methods in Natural LanguageProcessing, EMNLP ’11, pages 725–735, Stroudsburg, PA, USA, 2011. Association forComputational Linguistics.
[49] Constance L. Heitmeyer, Ralph D. Jeffords, and Bruce G. Labaw. Automated consistencychecking of requirements specifications. ACM Trans. Softw. Eng. Methodol., 5(3):231–261, 1996.
[50] Hewlett-Packard. Applications of Software Measurement Conference, 1999.
[51] M. Höst, B. Regnell, and C. Wohlin. Using students as subjects—a comparative study ofstudents and professionals in lead-time impact assessment. Empirical Software Engineering,5(3):201–214, 2000.
[52] Zhang Hong Hui and Atsushi Ohnishi. Integration and evolution method of scenarios fromdifferent viewpoints. In Proceedings of the 6th International Workshop on Principles of SoftwareEvolution, IWPSE ’03, pages 183–, Washington, DC, USA, 2003. IEEE Computer Society.
[53] R. Hurlbut. A Survey of Approaches for Describing and Formalizing Use Cases. Expertech, Ltd,1997.
[54] IEEE. IEEE Standard for Software Reviews (IEEE Std 1028-1997), 1997.
[55] IEEE. IEEE Recommended practice for software requirements specifications (IEEE Std830–1998), 1998.
[56] ISO. ISO/IEC 25010: 2011, Systems and software engineering–Systems and software QualityRequirements and Evaluation (SQuaRE)-System and software quality models, 2011.
[57] ISO/IEC. ISO/IEC 9126: Software engineering — Product quality — Parts 1–4, 2001–2004.
115
[58] Jonathan Jacky. The Way of Z: Practical Programming with Formal Methods. CambridgeUniversity Press, November 1996.
[59] I. Jacobson. Concepts for modeling large real time systems. Royal Institute of Technology, Dept.of Telecommunication Systems-Computer Systems, 1985.
[60] I. Jacobson. Object-oriented development in an industrial environment. ACM SIGPLANNotices, 22(12):183–191, 1987.
[61] I. Jacobson, M. Christerson, P. Jonsson, and G. Övergaard. Object-Oriented SoftwareEngineering: A Use Case Driven Approach. Addison Wesley Longman, Inc, 1992.
[62] Ivar Jacobson. Use cases - yesterday, today, and tomorrow. Technical report, Rational Software,2002.
[63] Aleksander Jarzebowicz. A method for anomaly detection in selected models of informationsystems. PhD thesis, Gdansk University of Technology, 2007.
[64] Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction toNatural Language Processing, Computational Linguistics, and Speech Recognition. Prentice HallPTR, Upper Saddle River, NJ, USA, 1st edition, 2000.
[65] J. Jurkiewicz, J. Nawrocki, M. Ochodek, and T. Glowacki. HAZOP-based identification of eventsin use cases. an empirical study. Empirical Software Engineering, Accepted Manuscript, 2013.
[66] J. Jurkiewicz, M. Ochodek, and B. Walter. Specyfikacja wymagan dla systemów informatycznych(Guidelines for Developing SRS). © Copyrights by Poznan City Hall, 2009.http://www.se.cs.put.poznan.pl/projects/consulting/um-srs, last checked:30th March 2011.
[67] Mayumi Itakura Kamata and Tetsuo Tamai. How does requirements quality relate to projectsuccess or failure? In Requirements Engineering Conference, 2007. RE ’07. 15th IEEEInternational, pages 69–78, 2007.
[68] Leon A. Kappelman, Robert McKeeman, and Lixuan Zhang. Early warning signs of it projectfailure: The dominant dozen. EDPACS, 35(1):1–10, January 2007.
[69] G. Karner. Metrics for objectory. No. LiTH-IDA-Ex-9344:21. Master’s thesis, University ofLinköping, Sweden, 1993.
[70] R. Kazman, M. Klein, and P. Clements. ATAM: Method for Architecture Evaluation. Technicalreport. Carnegie Mellon University, Software Engineering Institute, 2000.
[71] Mark Keil, Amrit Tiwana, and Ashley A. Bush. Reconciling user and project managerperceptions of it project risk: a delphi study. Inf. Syst. J., 12(2):103–120, 2002.
[72] Dan Klein and Christopher D. Manning. Accurate unlexicalized parsing. In In proceedings ofthe 41st Annual Meeting of the association for Computational Linguistics, pages 423–430, 2003.
[73] John C. Knight and E. Ann Myers. An improved inspection technique. Commun. ACM,36(11):51–61, 1993.
[74] Per Kroll and Philippe Kruchten. The rational unified process made easy: a practitioner’s guideto the RUP. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2003.
[75] Philippe Kruchten. The Rational Unified Process: An Introduction (3rd Edition).Addison-Wesley Professional, 3 edition, 2003.
[76] H.G. Lawley. Operability studies and hazard analysis. Chemical Engineering Progress,, 70(4):45,1974.
[77] Beum-Seuk Lee and Barrett R. Bryant. Automated conversion from requirementsdocumentation to an object-oriented formal specification language. In Proceedings of the 2002ACM symposium on Applied computing, SAC ’02, pages 932–936, New York, NY, USA, 2002.ACM.
116
[78] Dean Leffingwell. Calculating the return on investment from more effective requirementsmanagement. American Programmer, 10:13–16, 1997.
[79] Emmanuel Letier. Reasoning about agents in goal-oriented requirements engineering, 2001.
[80] Emmanuel Letier, Jeff Kramer, Jeff Magee, and Sebastián Uchitel. Monitoring and control inscenario-based requirements analysis. In ICSE, pages 382–391, 2005.
[81] Mich Luisa, Franch Mariangela, and Inverardi Pierluigi. Market research for requirementsanalysis using linguistic tools. Requir. Eng., 9(1):40–56, February 2004.
[82] Masayuki Makino and Atsushi Ohnishi. A method of scenario generation with differentialscenario. In RE, pages 337–338, 2008.
[83] Brian W. Mar. Requirements for development of software requirements. In Proceedings ofNCOSE, 1994.
[84] J. Martin. Managing the data-base environment. The James Martin books on computer systemsand telecommunications. Prentice-Hall, 1983.
[85] Pete McBreen. Creating acceptance tests from use cases.
[86] Kjetil Molkken and Magne Jrgensen. A review of surveys on software effort estimation. InProceedings of the 2003 International Symposium on Empirical Software Engineering, ISESE ’03,pages 223–, Washington, DC, USA, 2003. IEEE Computer Society.
[87] J. Nawrocki and Ł. Olek. UC Workbench - A tool for writing use cases. In 6th InternationalConference on Extreme Programming and Agile Processes, volume 3556 of Lecture Notes inComputer Science, pages 230–234. Springer-Verlag, June 2005.
[88] Jerzy Nawrocki and Łukasz Olek. UC Workbench – a tool for writing use cases and generatingmockups. In Extreme Programming and Agile Processes in Software Engineering, volume 3556of LNCS, pages 230–234. Springer, 2005.
[89] Jerzy Nawrocki and Łukasz Olek. Use-cases engineering with UC Workbench. In KrzysztofZieliÒski and Tomasz Szmuc, editors, Software Engineering: Evolution and EmergingTechnologies, volume 130, pages 319–329. IOS Press, oct 2005.
[90] Jerzy Nawrocki, Łukasz Olek, Michal Jasinski, Bartosz Paliswiat, Bartosz Walter, Baej Pietrzak,and Piotr Godek. Balancing agility and discipline with xprince. In Rapid Integration of SoftwareEngineering Techniques, volume 3943 of Lecture Notes in Computer Science, pages 266–277.Springer Berlin / Heidelberg, 2006.
[91] Clémentine Nebut, Franck Fleurey, Yves Le Traon, and Jean-Marc Jézéquel. Automatic testgeneration: A use case driven approach. IEEE Trans. Software Eng., 32(3):140–155, 2006.
[92] C.J. Neill and P.A. Laplante. Requirements engineering: The state of the practice. Software,IEEE, 20(6):40–45, 2003.
[93] M. Niazi and S. Shastry. Role of requirements engineering in software developement process:An empirical study. In Multi Topic Conference, 2003. INMIC 2003. 7th International, pages 402–407, 2003.
[94] Mirosław Ochodek. Empirical Examination of Use Case Points. PhD thesis, Poznan Universityof Technology, Poznan, Poland, 2011.
[95] Łukasz Olek. Ra-03/08: Generating acceptance tests from use cases. Technical ReportRA-03/08, Poznan University of Technology, 2008.
[96] Łukasz Olek, Jerzy Nawrocki, Bartosz Michalik, and Mirosław Ochodek. Quick prototyping ofweb applications. In Lech Madeyski, Mirosław Ochodek, Dawid Weiss, and Jaroslav Zendulka,editors, Software Engineering in Progress, pages 124–137. NAKOM, 2007.
[97] OMG. Business object DTF: Common business objects, OMG document bom/97-11-11, version1.5, February 1997.
117
[98] OMG. OMG Unified Modeling Language™(OMG UML), superstructure, version 2.3, May 2010.
[99] Roger Pressman. Software Engineering - A practitioners Approach. McGraw-Hill, 2001.
[100] David J. Pumfrey. The Principled Design of Computer System Safety Analyses, 1999.
[101] Anna N. Rafferty and Christopher D. Manning. Parsing three german treebanks: lexicalized andunlexicalized baselines. In Proceedings of the Workshop on Parsing German, PaGe ’08, pages40–46, Stroudsburg, PA, USA, 2008. Association for Computational Linguistics.
[102] F. Redmill, M. Chudleigh, and J. Catmur. System safety: HAZOP and software HAZOP. Wiley,1999.
[103] D. Rosenberg and M. Stephens. Use Case Driven Object Modeling with UML: Theory andPractice. Apress, Berkely, CA, USA, 2007.
[104] Serguei Roubtsov and Petra Heck. Use case-based acceptance testing of a large industrialsystem: Approach and experience report. In Proceedings of the Testing: Academic & IndustrialConference on Practice And Research Techniques, TAIC-PART ’06, pages 211–220, Washington,DC, USA, 2006. IEEE Computer Society.
[105] Chris Sauer and Christine Cuthbertson. The State of IT Project Management in the UK 2002 -2003. Technical report, Templeton College, University of Oxford, 2003.
[106] Roy Schmidt, Kalle Lyytinen, Mark Keil, and Paul Cule. Identifying software project risks: Aninternational delphi study. J. Manage. Inf. Syst., 17(4):5–36, March 2001.
[107] Ken Schwaber. Scrum development process. In Proceedings of the 10th Annual ACM Conferenceon Object Oriented Programming Systems, Languages, and Applications (OOPSLA, pages117–134, 1995.
[108] F. J. Shull, J. C. Carver, S. Vegas, and N. Juristo. The role of replications in empirical softwareengineering. Empirical Software Engineering, 13(2):211–218, 2008.
[109] M. Smiałek. Software Development with Reusable Requirements-based Cases. Prace Naukowe -Politechnika Warszawska: Elektryka. Oficyna Wydawnicza Politechniki Warszawskiej, 2007.
[110] S.S. Somé. Supporting use case based requirements engineering. Information and SoftwareTechnology, 48(1):43–58, 2006.
[111] Thitima Srivatanakul, John A. Clark, and Fiona Polack. Effective security requirements analysis:Hazop and use cases. In In Information Security: 7th International Conference, vol. 3225 ofLecture Notest in Computer Science, pages 416–427. Springer, 2004.
[112] Tor Stålhane and Guttorm Sindre. A comparison of two approaches to safety analysis based onuse cases. In Conceptual Modeling - ER 2007, volume 4801 of Lecture Notes in ComputerScience, pages 423–437. Springer Berlin / Heidelberg, 2007.
[113] Tor Stålhane, Guttorm Sindre, and Lydie du Bousquet. Comparing safety analysis based onsequence diagrams and textual use cases. In Advanced Information Systems Engineering,volume 6051 of Lecture Notes in Computer Science, pages 165–179. Springer, 2010.
[114] George E. Stark, Paul W. Oman, Alan Skillicorn, and Alan Ameele. An examination of the effectsof requirements changes on software maintenance releases. Journal of Software Maintenance,11(5):293–309, 1999.
[115] Alistair G. Sutcliffe, Neil A.M. Maiden, Shailey Minocha, and Darrel Manuel. Supportingscenario-based requirements engineering. IEEE Transactions on Software Engineering, 24(12),December 1998.
[116] Hazel Taylor. Critical risks in outsourced it projects: the intractable and the unforeseen.Commun. ACM, 49(11):74–79, November 2006.
[117] Alan M. Turing. Computing Machinery and Intelligence. Mind, LIX:433–460, 1950.
118
[118] Axel van Lamsweerde and Emmanuel Letier. Handling obstacles in goal-oriented requirementsengineering. IEEE Trans. Softw. Eng., 26(10):978–1005, October 2000.
[119] Brenda Whittaker. What went wrong? unsuccessful information technology projects. Inf.Manag. Comput. Security, 7(1):23–30, 1999.
[120] Karl E. Wiegers. Inspection checklist for use case documents. In Proceedings of 9th IEEEInternational Software Metrics Symposium. IEEE, September 2003.
[121] K.E. Wiegers. Software Requirements. Microsoft Press, Redmond, WA, USA, 2nd edition, 2003.
[122] Willie and Schlechter. Process risk assessment - Using science to ’do it right’. InternationalJournal of Pressure Vessels and Piping, 61(2-3):479–494, 1995.
[123] J. Wood and D. Silver. Joint application development. Wiley, 1995.
[124] K. T. Yeo. Critical failure factors in information system projects. International Journal of ProjectManagement, 20(3):241–246, April 2002.
[125] Eric S. K. Yu. Towards modeling and reasoning support for early-phase requirementsengineering. In Proceedings of the 3rd IEEE International Symposium on RequirementsEngineering, RE ’97, pages 226–, Washington, DC, USA, 1997. IEEE Computer Society.
[126] Hong Zhu and Lingzi Jin. Scenario analysis in an automated tool for requirements engineering.Requir. Eng., 5(1):2–22, 2000.
[127] B.D. Zumbo and A.M. Hubley. A note on misconceptions concerning prospective andretrospective power. Journal of the Royal Statistical Society – Series D: The Statistician,47(2):385–388, 1998.
© 2013 Jakub Jurkiewicz
Institute of Computing SciencePoznan University of Technology, Poznan, Poland
Typeset using LATEX in Adobe Utopia.Compiled on November 7, 2013
BibTEX entry:
@phdthesis{jjurkiewicz2013phd,author = "Jakub Jurkiewicz",title = "{Identification of Events in Use Cases}",school = "Pozna{\’n} University of Technology",address = "Pozna{\’n}, Poland",year = "2013",
}