05-Jul-18
1
SATTOSE 2018 HACKATONWhat are the effects of using UML
in Open Source Projects ?
(*) Chalmers | University of Gothenburg(+) Universidad Rey Juan Carlos
Regina Hebig * Gregorio Robles + Miguel-Angel Fernandez +
Truong Ho-Quang * Michel R.V. Chaudron *
Outline of Talk
• Introduction
• Recap UML & Definitions/Terminology
– Model, Design, Diagram
• What do we know about the effects of modeling in software projects?
• Description of Dataset
• Interesting Directions for Research
05-Jul-18
2
Thorsten Berger
Software Engineering Division Staff
Michel Chaudron
Jan Bosch
Christian Berger
Richard Torkar
ImedHammouda
MiroslawStaron
Patrizio Pelliccione
IvicaCrnkovic
RiccardoScandariato
Jan-Philipp
Steghöfer
Regina Hebig
AgnetaNilsson
Eric Knauss
Rogardt Heldal
Empirical Software
Engineering
Software Architecture
Model-driven Software
EngineeringSoftware Metrics
Open Source Systems
Variability and Software Product Lines
Self-Adaptive Systems
Management of Software
Projects
Software Security
Richard Berntsson
JenniferHorkoff
Software Testing/QA
IndustrialCollaboration
GulCalikli
Francisco Oliveira Neto
Robert Feldt
PhilipLeitner
Requirements Engineering
SOFTWARE ENGINEERING CONFERENCES IN GOTHENBURG
Open Source 2016REFSQ 2016ICSA 2017Mensura 2017IFIPTM 2017 Int. Conf. on Trust Management
+ yearly Lindholmen Software Development day (600+)
http://icsa-conferencesorg/2017/
http://www.iwsm-mensura.org/2017
http://wp.portal.chalmers.se/ifiptm2017/
05-Jul-18
3
My Background
• Since 2012 Prof. in Software Engineering in Gothenburg
• Background
– Ph.D. in Formal Methods: Calculi for program derivation
(algebra of parallel processes, refinement) from Leiden, NL
– 2 years in industry as IT consultant
– Assistant & Associate prof. at TU Eindhoven & Leiden, Netherlands
– Software -Architecture, -Design & -Modeling
Component-based SE
• Industrial research collaborations
– Enterprise Architecture, IT Portfolio, Alignment, Agility
– Research Collaborations: EU/ITEA – o.a. Philips, KLM, ING,
Capgemini, and in Sweden: o.a. Ericsson, Volvo
Introduction: Research Interests• What are the pay-offs of investing in
early design/architecture/modeling? Fewer defects? Cheaper maintenance? …
• Analysis and Reasoning about Quality Properties of System Architectures – Automating Architecture Design
– Maintainability, Technical Debt
• Focus on UML in custom software development
– De facto standard
Resource useCPU, Mem, Network
Performance
Reliability
Timeliness
System
Security
Maintainability
05-Jul-18
4
MRV Chaudron SOFSEM 2018
Recap UML : 4+1 Views model
7
Structure view:class/component-diagram
Behaviour view:
Sequence diagram
A B
C D
A B C D
Deployment view:
physical model + mapping
AB
C D
Use cases
BC/WC e2e-response times, freq. bandwidth, availability
TCP/IP over Ethernet
file ownership
Config. Mngnt view
versioning policies
…
Development view
MRV Chaudron SOFSEM 2018
05-Jul-18
5
MRV Chaudron SOFSEM 2018
Modelling History
Juha-Pekka Tolvanen, CEO Meta-Case, Keynote at Code Generation 2014:
The business cases of modeling and generators
MRV Chaudron SOFSEM 2018
Fake UML News – Urban Myths
• UML is the best software notation
• UML is a notation, not a method –
• UML can only be used with object orientation
• UML is not a process
i.e. UML is not tied to waterfall or RUP or SCRUM or …
• Only when used for code generation, is modeling effective
• Using UML guarantees a good design
• Michel Chaudron is a UML advocate
05-Jul-18
6
MRV Chaudron SOFSEM 2018
M. R.V. Chaudron – May 2011
Definitions
Model (noun) = an abstract representationof a thing/system
often systematic representation
Models abstract: they focus on the essential features and leave out others.
Modeling = the process of creating a representation
(verb) i.e. choosing what to represent and how to represent it
MRV Chaudron SOFSEM 2018
M. R.V. Chaudron – May 2011
Design (verb, noun)
Definition
Design (v) = the process of making decisions about something that is to be built or created:
Design (n) = the plans, drawings, etc., that show how something can be made
Pitfall : ‘design’ & ‘model’ can be a verb and a noun
05-Jul-18
7
MRV Chaudron SOFSEM 2018
Model vs Diagram
Model
• Systematic use of well-defined concepts and notation for representation(intent is to define)
• Notation has consistent semantics (meaning)
• Amenable to (automated) manipulation - such as simulation, analysis – based on meaning of model
Diagram
• Choice of notation to support communication of content (intent is to communicate)
• Notation may be informal, without consistent meaning
It is not just the appearance that distinguishes models and diagrams; more importantly: systematic use of semantics
MRV Chaudron SOFSEM 2018
M. R.V. Chaudron – May 2011
How do ‘modeling’ and ‘design’ relate?
Modeling and designing often go hand-in-hand:
A model is used to understand, reason, analyse, break-down, which leads to adaptation and refinement of the design.
MODELING DESIGNING
05-Jul-18
8
MRV Chaudron SOFSEM 2018
M. R.V. Chaudron – May 2011
Ideation ProductionExternalization
Building
Animation
Stages of Design & Modeling
Develop idea Explain to others Blueprint for Production
MRV Chaudron SOFSEM 2018
M. R.V. Chaudron – May 2011
Ideation ProductionExternalization
Building
Animation
Software
05-Jul-18
9
MRV Chaudron SOFSEM 2018
Motivation
Why should we care about modeling software design?
Anti-Modeling-camp:
“Models are useless.
Code is the only truth.”
Pro-Modeling-camp:
“I can not imagine we could develop
our systems without any modeling.”
MRV Chaudron SOFSEM 2018
Pro Modeling & Design
Context factors- Complex endeavor- Large project team- Multidisciplinary
expertise- Large scope over time
‘Knowledge Management’, Planning & Coordination
05-Jul-18
10
MRV Chaudron SOFSEM 2018
Summary of Arguments in favour of Modelling in Software Development
+ Communicating /
Coordination
+ Analysing / Predicting
+ Understanding
/ Structuring
+ Guiding
+ Blueprint for Production
MRV Chaudron SOFSEM 2018
To Model or Not to Model?
05-Jul-18
11
Motivation for this study:• Study the effect of modeling on software development
• General belief:
• Modeling has positive effect on quality of design and thus on maintenance
Increasing importance of Open Source
• Open source sw projects often happen in distributed setting.
• Here, diagrams may have an important role for achieving coordinated / shared vision about the system(design)
- NOT: IF modeling is any good, but WHEN and HOW
Modeling in Broader Context
Knowledge Management
Codification Communication
Documentation
UML
Comments E-mailsChats
…
volatile
05-Jul-18
12
M. R.V. Chaudron – May 2011
Modeling & Documentation in Agile Development
• Agile principles: working software over comprehensive documentation
Tending towards: we need a bit more
Christoph J. Stettina and Werner Heijstek, Necessary and Neglected? An Empirical Study of Internal Documentation in Agile
Software Development Teams 29th ACM Int. Conf. on Design of Communication , Pisa, Italy
Modeling is compatible with agile development
Survey under75+ agile developers
Ideation Externalization Production
Main audience:
Target audience of the models?
computerspeople
Ok with informal spec(has context, domain
& background knowledge)
Needsformal and
complete spec
05-Jul-18
13
MRV Chaudron SOFSEM 2018
How to study this ?
• Communicating /
Coordination
• Analysing / Predicting
• Understanding
/ Structuring
• Guiding
• Production
Software Quality- Modularity- Defects
Understandability of designs?
Misunderstandings??
Speed / Efficiency
Lack of Empirical Data
Data
Value/Utility
Knowledge
Projects with UML
discovery
Enrich via labelling:- Reverse vs Fwd design?- Student project?- Domain model vs design model?
Patterns?Which ones to look for?
toolsuse
05-Jul-18
14
What do we know about effects of modeling in software projects?
Industrial Documentation Process
Requirements
Impact &
feasibility
study
Plan
Project
Initiation
Document
Risk
log
Progres
s report
Financial
report
Environmental
test report
Go-live
report
Final
report
Service
manual
document
Instructions
for
development
IT Global
Design
Functional
document
Change
of code
Change of
documentation
Technical
designAutomated
/manual
tests
Test classes
+
Test strategy
Design
storyUse
story
Use
cases
Deployed
app
Project manager
Business
stakeholder
Information
analystArchitect Technical
lead
Developer TesterAnalyst DeployerTest lead
05-Jul-18
15
M. R.V. Chaudron – May 2011
Does the use of modeling improve software quality?
A large number of developers indicated the use of UML improves understandability and modularity
45%41%
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.0
CoverRequirements
Correctness Modularity Testability Understandability
Percen
tag
e o
f th
e r
esp
on
den
ts
Reduce
Somewhat Reduce
Neutral
Somewhat Improve
Improve
Ari
ad
i N
ug
roh
o, M
ich
el R
. V
. C
hau
dro
n: A
Su
rvey o
f th
e P
ract
ice o
f D
esi
gn
-
Co
de C
orr
esp
on
den
ce a
mo
ng
st P
rofe
ssio
nal
So
ftw
are
En
gin
eers
. ESEM
2007: 467-4
69
Economy of Modeling
Size of the system% o
f sy
ste
m c
ove
red
by
mo
de
l
Based on research together with Hafeez Osman in Leiden
05-Jul-18
16
MRV Chaudron SOFSEM 2018
M. R.V. Chaudron – May 2011
Which parts of my designshould I model?
Design models focus on:
- risky parts (complex, unknown)
- key structures / behaviours
More experienced designers leave out more information from the model
MRV Chaudron SOFSEM 2018
M. R.V. Chaudron – May 2011
Paym ent
-amount
-cashTendered
Cash
+authorized()
-name
-bankID
Check
+authorized()
-number
-type
-expDate
Credit
+calcTax()
+calcTotal()
+calcTotalWeight()
-date
-status
O rder
+calcSubTotal()
+calcW eight()
-quantity
-taxStatus
OrderDetail
1
1
1* Preprocessor Parser Analyser Inheritance Relator Statistic Filter Statistic CalculatorDB Creator DB Filler DB Checker
SAAT.BAT
Select ‘defects’ in defect DB
Find classes in source code that were
repaired for solving this defects
Find corresponding classes in UML models
Determine LoDfor CD and SD
Paym ent
-amount
-cashTendered
Cash
+authorized()
-name-bankID
Check
+authorized()
-number
-type
-expDate
Credit
+calcTax()
+calcTotal()
+calcTotalWeight()
-date-status
Order
+calcSubTotal()
+calcWeight()
-quantity-taxStatus
OrderDetail
1
1
1* Preprocessor Parser Analyser Inheritance Relator Statistic Filter Statistic CalculatorDB Creator DB Filler DB Checker
SAAT.BAT
component C2
requires I2
requires I3
uses I2.g
uses I3.h
component C2
requires I2
requires I3
uses I2.g
uses I3.h
defectdensity
LoD per class
Relation between UML-LoD and Code Quality
05-Jul-18
17
MRV Chaudron SOFSEM 2018
M. R.V. Chaudron – May 2011
Level of Detail for Sequence Diagrams is significantly (negatively) correlated with defect density.
More detailed model => fewer defects
Relation between Level of Detailand Defect Density
= 1 class
What about class diagrams & structural design of software? Nu
gro
ho
, A
riad
i, B
as F
lato
n,
and
Mic
hel
RV
Ch
aud
ron
. "E
mp
iric
al a
nal
ysis
of
the
rela
tio
n b
etw
een
lev
el o
f d
etai
l in
UM
L m
od
els
and
def
ect
den
sity
."
InIn
tern
ati
on
al
Co
nfe
ren
ce o
n M
od
el D
rive
n E
ng
inee
rin
g L
an
gu
ag
es a
nd
Sys
tem
s, p
p.
60
0-6
14
. Sp
rin
ger,
Ber
lin, H
eid
elb
erg,
20
08
.
MRV Chaudron SOFSEM 2018
M. R.V. Chaudron – May 2011
UML as vehicle for communication• Experiment: subjects were asked to chose the source code
that matched a UML model (multiple choice).Some questions contained ‘defects’, others not.
• Subjects were 111 students and 48 professional developers (experienced with UML)
• Defects in UML models are detected only to a small degree
• Subjects like to use domain knowledge • either to fill in ‘gaps’, or instead of detailed reading (lazy!)
Effects of Defects in UML models, Lange, C. F.J.; Chaudron, M.R.V., ICSE 2006. Shanghai
05-Jul-18
18
M. R.V. Chaudron – May 2011
Improved
Quality
UML Modeling
Fewer
Defects
Reduced
maintenance
effort
Better
Understanding
Problem Domain
Reduced
testing
effort
Reduced
rework
More efficient
Maintenance
“validation”
building the
right system
“verification”
building the
system right
Improved
requirements
Better Risk
management
Improved
Productivity
More
Accurate
Estimating
Improved Design
Compliance
Theory of Benefits of Modeling
Developer
Benefits
Process
Benefits
Product
Benefits
Project Management
Benefits
Improved
Design Quality
Shared
System Model
Common
Language
Community
Benefits
Improved
Communication
Better
Understanding
Solution Space
More efficient
Testing
Chaudron, M. R., Heijstek, W., & Nugroho, A. (2012). How effective is
UML modeling?. Software & Systems Modeling, 11(4), 571-580.
Improved
knowledge/awareness of
design / construction
practices
M. R.V. Chaudron – May 2011
Model of Impact
Software Project
Software Quality
Software Architecture/
Design Documentation
05-Jul-18
19
M. R.V. Chaudron – May 2011
Model of Impact
Software Project
Software Quality
Software Architecture/
Design Documentation Efficiency of
Development Process
Quality of Design& Documentation
Quality of People
Effectiveness of Use
• Ease of finding• Effective sharing• Effective updating
• Ease of understanding effective collaboration/knowledge
sharing
Description of the dataset
05-Jul-18
20
Modeling OSS?!
UML?
UML?
UML?
Methodology
File formats: images (.png, .svg, …), .uml,
.xmi
GitHub
1 Data collection
Potential UML file list
2 Filter UML files
UML Image Classifier
Textual Filter
Validation
GHTorrent
UML File list
LindholmenDataset
First results on HOW Model are used: “The quest for open source projects that use UML: mining GitHub”MoDELS 2016, R. Hebig, T. Ho-Quang, M. Chaudron, G. Robles, M. Fernández
05-Jul-18
21
First results on HOW Model are used: “The quest for open source projects that use UML: mining GitHub”MoDELS 2016, R. Hebig, T. Ho-Quang, M. Chaudron, G. Robles, M. Fernández
Methodology+12 000 000 non-
forked repos (100% of all GitHub projects)
Identified +93 000 UML files
+24000 repositories
July 2015
July 2016
GitHub
1 Data collection
Potential UML file list
2 Filter UML files
UML Image Classifier
Textual Filter
Validation
GHTorrent
UML File list
LindholmenDataset
Extracting Model from Images
Input: .jpeg, .png
Bilal
Recognizing:- Rectangles- Lines- Text,- arrow-heads
Output: XMI for the class model
Karasneh, Bilal, and Michel RV Chaudron. "Img2uml: A system for extracting uml models from images." In Software Engineering and Advanced Applications (SEAA), 2013 39th EUROMICRO Conference on, pp. 134-137. IEEE, 2013.
05-Jul-18
22
Example analysis: size of class diagrams
0
100
200
300
400
500
600
0 3 6 9 12
15
18
21
24
27
30
33
36
39
42
45
48
52
55
58
61
64
69
72
75
78
83
88
95
99
113
125
136
151
165
169
244
337
1583
No.ofdiagrams
No.ofclasses
Distributionofdiagramsbynumberofclasses
Countries of Projects that use UML
05-Jul-18
23
SATTOSE dataset
• data of 2,500 OSS projects with more than 13,500 UML models.
• Formatted as a MySQL database
• Size of the MySQL self-contained dump-file is about 100MB (zip)
• URL: goo.gl/RaC8PB
• http://oss.models-db.com/Downloads/SATToSE2018_Hackathon.
Data-schema
Detailed description of the database schema can be found
at http://oss.models-db.com/Downloads/SATToSE2018_Hackathon.
05-Jul-18
24
05-Jul-18
26
Some questions to get started• Is design (UML) done upfront or is it done after the
fact?
• How detailed is the design documentation?
• How up to date is the design documentation?
• How does UML help with onboarding? Not necessarily more new people joining the project, but maybe getting the ones that join up to speed quicker?
More questions to get started• Does the quality of documentation impact the quality of
the (design structure of) the implementation?– such as modularity?
• Does UML help decrease defect repair times?• Are design docs more useful early in the creation stage of
a project than later on, when things mature / stabilize• Do code comments serve as an alternative to design
documentation for sharing knowledge?• People learn differently. Who on a team benefits most
from design docs?
I propose you try to work in pairs or triples.Present your work on Friday!
05-Jul-18
27
• Can we find modeling practices?
• Software Processes?
– Any effect on communication patterns?
– Efficiency/Speed
• On-boarding?
• Software Quality?
– Defects? Modularity?
• Can we assess the quality of – Knowledge sharing?
• Which knowledge is codified? Which is needed?
• Which ’channels’? How often used? Evolution over time?Novices vs experienced contributors?
– Documentation?• Code-level
• API’s
• Architecture/design level
05-Jul-18
28
MRV Chaudron SOFSEM 2018
M. R.V. Chaudron – May 2011
Understanding the Modeling Process
Series 1: Code commitsSeries 2: Documentation commitsSeries 3: UML/model commits
05-Jul-18
29
MRV Chaudron SOFSEM 2018
M. R.V. Chaudron – May 2011
Understanding the Modeling Process
Series 1: Code commitsSeries 2: Documentation commitsSeries 3: UML/model commits
MRV Chaudron SOFSEM 2018
M. R.V. Chaudron – May 2011
File type .uml xmi svg png jpeg bmp jpg gif
Share 34% 4% 1% <1% <1% 10% 13%
Distribution of UML files by file type
NB. We did not yet look into: pdf, word, ppt, specific
tooling-formats, ASCII
60%
37%
image
formats
05-Jul-18
30
MRV Chaudron SOFSEM 2018
M. R.V. Chaudron – May 2011
#UML files 1 [2-9] [10-99] [100,∞)
#Repositories 14 749 8 567 1 333 68
Projects grouped by number of UML
files
MRV Chaudron SOFSEM 2018
Some arguments against modeling
1. Creating models takes too much time
a. We can do the same without models
2. Maintaining models takes too much time
3. Notation is too complex
4. Notation is not expressive enough
05-Jul-18
31
MRV Chaudron SOFSEM 2018
Some arguments against modeling
1. Creating models takes too much time!
Experiments show that modeling a design for a medium to large system takes 1 or 2 (person)days.
This is insignificant on the scale of projects.
a. We can do the same without modelling.
Much of what is seen as modeling effort, is actually design-activity.Actual evidence is missing
MRV Chaudron SOFSEM 2018
2. Maintaining models takes too much time
Possibly.
It is unlikely that the actual updating takes too much effort. More likely navigating documentation and checking completeness of updates.
+ Clumsy practices: no naming conventions ,no version control, no quality assurance, …
05-Jul-18
32
MRV Chaudron SOFSEM 2018
3. Notation is too complex
Experts use lots of aspects of notation. Beginners only few.
In large projects, teams prefer ‘least common denominator’ in notation: only use what everyone understands.
4. Notation is not expressive enough
Unlikely – no actual evidence.
More likely: immature usability of tooling: mixing diagrams, with sketches, annotations, linking/referencing to/synchronizing wit source code, …
MRV Chaudron SOFSEM 2018
Use Reverse Engineering i.s.o. Design
Pro’s:
• Always up to date
Con’s:
• Too much detail
• No automated abstraction
– What are the essential parts?
• Currently as time-consuming as creating FWD designs
Top Related