ICSME 2016 keynote: An ecosystemic and socio-technical view on software maintenance and evolution
Transcript of ICSME 2016 keynote: An ecosystemic and socio-technical view on software maintenance and evolution
An Ecosystemicand Socio-TechnicalView on Software
Maintenance & Evolution
Tom Mens @tom_mensCOMPLEXYS Research Institute
University of Mons, Belgium
-1999 PhD @VUB
1999-2003Postdoc @VUB
2003-now(full) professor
OO design &
refactoring
MDSE, model transformation
empirical research of
software ecosystems
2004 20081994- 2004
1998- 2004
2010- now
Research Collaborators
Research Context
2012-2017 ongoing research project“Ecological Studies of Open Source Software Ecosystems”
- Interdisciplinary research- Use ideas from biological ecology to understand and
improve evolution of software ecosystems
A software ecosystem is a collection of software projects that are developed
and evolve together in the same environment.
Mircea Lungu(PhD, 2008)
8
Software Ecosystem Examples
Gnome
CRAN
Debian Ubuntu KDE
JavaScript Ruby
When things go wrong…
CRAN
Credits: http://www.designandanalytics.com/cran-gephi/
Package dependency graph
> 9K active packages> 21K dependenciesin April 2016
CRAN
• Increasing number of R packages hosted on GitHub“non-transparent nature of the CRAN submission / rejection process”“CRAN […] is revealing some limitations of the current design. One such problem is the general lack of dependency versioning in the infrastructure.”
• Problems with breaking dependencies“It is more and more of a pain if the package I’m depending on breaks”“One recent example was the forced roll-back of the ggplot2 update to version 0.9.0, because the introduced changes caused several other packages to break.”
Decan et al. “When GitHub Meets CRAN: An Analysis of Inter-Repository Package Dependency Problems.” SANER 2016
JavaScript> 317K packages > 728K dependencies in June 2016
JavaScript
• Deliberate desire to distribute micropackages• Lots of dependencies to micropackagesExample: isarray
(150 direct, 77K transitive in-deps on Aug 2016)
var toString = {}.toString;module.exports = Array.isArray || function (arr) { return toString.call(arr) == '[object Array]’;};
David Haney’s code blog, March 2016http://www.haneycodes.net/npm-left-pad-have-we-forgotten-how-to-program/
Example: leftpad
• Package leftpadfunction leftpad (str, len, ch) { str = String(str); var i = -1; if (!ch && ch !== 0) ch = ' '; len = len - str.length; while (++i < len) { str = ch + str; } return str;}
• What happened?– Its developer unpublished all his modules from npm
“This impacted many thousands of projects. [...] We began observing hundreds of failures per minute, as dependent projects – and their dependents, and their dependents... – all failed when requesting the now-unpublished package.”
http://blog.npmjs.org/post/141577284765/kik-left-pad-and-npm
Example: leftpad
Departure of acentral contributor
• All bug handling became concentrated in 1 contributor• Contributor suddenly left project, being dissatisfied• Lasting negative impact on bug handling performance
Zanetti et al. “The rise and fall of a central contributor: Dynamics of social organization and performance in the Gentoo community.” CHASE 2013
17
Strict policy and tools for ensuring backward compatibility• “Prime Directive: When evolving the Component API
from release to release, do not break existing clients”
Bogart et al. “How to break an API: Cost negotiation and community values in three software ecosystems.” FSE 2016
18
May lead to stagnation and drive away developers – Coordination around synchronized yearly releases slows
down development
“If you have hip things, then you get people who create new APIs on top of that […] These things don’t happen on the Eclipse platform anymore.” “you have to be very patient and know who to talk with […] in order to get your patches accepted, and I think it’s very intimidating for some new people to come on.”
Bogart et al. “How to break an API: Cost negotiation and community values in three software ecosystems.” FSE 2016
Socio-Technical View
20
• Software ecosystems suffer from problems because of technical factors, social reasons, or both.
• A socio-technical viewis therefore essential for software ecosystem evolution research.
Socio-Technical View
• Socio-technical analyses can benefit frommixed method research– Combine quantitative and qualitative methods
into a single study• Empirical analysis of objective data• user surveys and interviews
– Exploiting their complementarity increases confidence of the findings
Johnson et al. Mixed methods research: A research paradigm whose time has come. Educational Researcher 33(7): 14–26, 2004
Software Ecosystem (SECO)Research Challenges
Understanding SECOs• How are SECOs structured?• What are their tools, habits, values, boundaries?• How do they emerge and evolve over time?• What are the mechanisms driving their dynamics?• How do different SECOs compare?• How to face technical challenges?
Serebrenik et al. “Challenges in Software Ecosystems Research.” IWSECO-WEA 2015
Software EcosystemResearch Challenges
Supporting SECO communities• How can they be made more sustainable and
resilient?• How can we predict their evolution?• How can we improve the SECO?
– In terms of productivity, quality, diversity, maintainability, survival, popularity, …
Serebrenik et al. “Challenges in Software Ecosystems Research.” IWSECO-WEA 2015
Supporting SECOsIncreasing resilience & sustainability
24
Can the SECO• resist to major disturbances?• return to a stable equilibrium after a major
disturbance?
Possible approach:• Estimate, predict and reduce risk of bus factor
Bus factorSocial view
Specific activity concentrated in few persons.Examples:
– Single responsible for bug handling in Gentoo– Only one developer knows some part of the code
Bus factorTechnical view
Too much software components depend on a single software component.
– Makes components more brittle to future changes– npm leftpad example
Bus factor
Active area of research
At least 4 GitHub projects compute (social) bus factor.
Cosentino et al. “Assessing the bus factor of Git repositories.” SANER 2015
Avelino et al. “A novel approach for estimating truck factors.” ICPC 2016
Bus factor
Experimental support on GitHubhttps://libraries.io/bus-factor
Bus factor
https://dependencyci.com
Supporting SECOsImproving quality
By increasing technical wealththrough reducing technical debt
“a concept in programming that reflects the extra development work that arises when code that is easy to implement in the short run is used instead of applying the best overall solution”(Ward Cunningham, 1992)
http://legacycoderocks.libsyn.com/technical-wealth-with-declan-wheelan
Implementation of SQALE model in SonarQube
Supporting SECOsImproving quality
Social view: Reducing social debt “Unforeseen project cost connected to sub-optimal organizational-social structures”
Supporting SECOsImproving quality
Reducing social debt by removing community smells– Organisational silo
• High decoupling and lack of communication between tasks– Black cloud
• lack of people able to bridge the knowledge and experience gap between distinct communities
– Prima-donnas• Seemingly condescending and egotistical behaviour, irreceptiveness to
collaboration– Sharing villainy
• Lack of knowledge exchange incentives– Organisational skirmish
• Misalignment of organisational cultures between distinct communities – …
Interdisciplinary research
“Many challenges we face are not solvable by people remaining in their single discipline silos”…
www.newscientist.com/article/mg20928002-100-open-your-mind-to-interdisciplinary-research/
Interdisciplinary research
“bringing […] disciplines together in the long term is what provides the big, big breakthroughs”
Interdisciplinary researchSocial Network Analysis (SNA)
Social Network Analysis
Social network centrality measuresDegree
Number of in- or outgoing dependencies of a node.
BetweennessQuantifies number of times a node acts as a bridge along the shortest path between two other nodes.
ClosenessThe more central a node, the lower its total distance from all other nodes.
Eigenvector centrality and PageRankMeasures the influence of a node in a network.
Social Network Analysis
Social network centrality measures
Social Network Analysis
Can be used to– detect social debt– identify social bus factor– predict software failures– … and many more …
Social Network Analysis
Social bus factor in Gentoo Linux– All bug handling became concentrated in one contributor– Measured by significant increase of centralization and
performance.
Zanetti et al. “The rise and fall of a central contributor: Dynamics of social organization and performance in the Gentoo community.” CHASE 2013
Social Network Analysis
Social bus factor in Gentoo Linux– Contributor suddenly left the project, being
dissatisfied– Sentiment analysis showed correlation with negative
emotions– Lasting negative impact on the bug handling
performance of the community.
Zanetti et al. “The rise and fall of a central contributor: Dynamics of social organization and performance in the Gentoo community.” CHASE 2013
Use of SNA to better predict software failures– By combining program dependency information
with social network information
Social Network Analysis
Bird et al. “Putting it All Together: Using Socio-Technical Networks to Predict Failures.” ISSRE 2009
Pinzger et al. “Can developer-module networks predict failures?”FSE 2008
Mirroring hypothesis
Conway’s lawSoftware structure tends to mirror the organisational/social structure
A.k.a. socio-technical congruencealignment between technical dependencies and social coordination in a project
Mirroring hypothesis
Conway’s law
• Evidence in favor: commercial “in-house” development
• Evidence against: “community-based” development
More modular software=> emergent “complex network” structure?
MacCormack et al. “Exploring the duality between product and organizational architectures: A test of the mirroring hypothesis.” Research Policy, 2012.
Colfer et al. “The mirroring hypothesis: Theory, evidence and exceptions.” Harvard Business School, 2010.
Interdisciplinary researchComplex Systems
Interdisciplinary researchComplexity Theory
Interdisciplinary researchComplex Systems
“A new approach to science that investigates how relationships between parts give rise to the collective behaviors of a system and how the system interacts
and forms relationships with its environment.”
Emergence: process whereby larger entities, patterns, and regularities arise through interactions among smaller or simpler entities that themselves do not exhibit such properties.
Complexity TheoryNetwork Theory
Citation from Mitchell’s book:
“network thinking is providing novel ways to think about difficult problems such as how to do efficient search on the Web, […] how to manage large organisations, how to preserve ecosystems, […] and, more generally, what kind of resilience and vulnerabilities are intrinsic to natural, social, and technological networks, and how to exploit and protect such systems.”
Complexity TheoryNetwork Theory
Some characteristics of complex networks:
Small-world property• Low average path length between any two nodes• Highly-clustered components linked through hubs
Skewed distributions (power law behaviour)• Few nodes with very high in-degree (resp. out-degree),
many nodes with very small in-degree (resp. out)
Complexity TheoryNetwork Theory
Some characteristics of complex networks:
Scale-freeness• Observed degree distribution is very similar
regardless of the scale of the observation
Scale-free networks are resilient• Robust to deletion of random (non-hub) nodes• vulnerable to the deletion of hubs
Complexity TheoryNetwork Theory
Examples of complex networks exhibiting these characteristics
– World-Wide Web– (Technical) software dependency graphs– Social networks (e.g. Facebook)– (Socio-technical) software ecosystems
Complexity TheoryNetwork Theory
Examples of softwaresystem dependencynetworks
Network TheoryPossible applications for SECOs• Provide prediction/forecasting models
– of how SECOs emerge– of how SECOs grow/evolve
• Estimate the resilience and sustainability of SECOs after major disturbances
• Assess risk of deleting hub nodes bus factor!
Network TheoryPossible applications for SECOsHow do SECOs emerge and grow?
A popular model is preferential attachmentOver time, nodes with higher degree receive more links than nodes with lower degree.
Extensions of this model have been proposed to simulate the growth of complex software systems
By mimicking the principle of coupling & cohesion
Barabasi et al. Emergence of Scaling in Random Networks. Science 286, 1999
Li et al. Multi-Level Formation of Complex Software Systems. Entropy 18(178), 2016
Network TheoryPossible applications for SECOs
Interdisciplinary researchEcology and natural ecosystems
Ecology and natural ecosystems
Biodiversity of species E.g. hosts – parasites / plants – pollinators
58
Mutual dependency and functional redundancy
Disappearing species may be compensated by others if there is sufficient diversity in both layers.
Ecology and natural ecosystems
Diversity metrics• species richness = number of different species in the ecosystem• species evenness (entropy) = relative abundance of the
population of each species in the ecosystem• Shannon diversity index (relative entropy) = specialisation of a
given species in relation to the species in the other level• Simpson index = degree of concentration when individuals are
classified into species
59
Software Ecosystems
Diversity in software ecosystems
62
Mutual dependency and functional redundancy
Disappearance of projects or contributors may be
compensated by others.
Software EcosystemsDiversity
Are software project teams diverse?– In terms of code ownership, types of activity,
gender balance, seniority, …How does this diversity affect …
– defect-proneness?– productivity?– …
Software EcosystemsDiversity
Success story of diversity measures:Assess defect-proneness in software projects
• More focused developers introduce fewer defects. • Modules receiving narrowly focused activity
are more likely to contain defects.
Posnett et al. Dual Ecological Measures of Focus in Software development.ICSE 2013
Software EcosystemsGender Diversity
Effect of gender diversity on productivity?Women underrepresented in programming
– industry: 16-18% female developers– open source: ~10%– social coding platforms:
• GitHub: ~9%• StackOverflow: ~7%
Vasilescu et al. Gender and tenure diversity in GitHub teams. CHI 2015A Data Set for Social Diversity Studies of GitHub Teams (MSR’15)
Software EcosystemsGender Diversity
Success story of diversity measures:– Gender and tenure diversity are positive and
significant predictors of productivity– Teams that are more balanced in terms of gender
and seniority have higher productivity rates
Vasilescu et al. Gender and tenure diversity in GitHub teams. CHI 2015
Interdisciplinary researchSurvival Analysis
Statistical technique used in many disciplines to analyze the time until the occurrence of an event of interest• Medicine
– Effect of treatment or medicine to cure disease– Effect of disease on patient mortality
• Sociology– Factors influencing marriage or divorce
Interdisciplinary researchSurvival Analysis
Interdisciplinary researchSurvival Analysis
Success story:OSS project survival
Factors positivelyinfluencing survival:
#contributorsProject age
Basis for predictionmodels
Samoladas et al. Survival analysis on the duration of open source projects. IST 2010
SECO Research Challenges continued…
Understanding SECOs• How do different SECOs compare?• How to face technical challenges?
– Big data– Privacy versus reproducibility
Serebrenik et al. “Challenges in Software Ecosystems Research.” IWSECO-WEA 2015
Research ChallengeComparing SECOs
• Each software ecosystem– has specific habits, expectations, change policies– uses specific tools
• Taking into account these differences is important– to support SECO maintenance and evolution– to generalise research findings across SECOs
Bogart et al. “How to break an API: Cost negotiation and community values in three software ecosystems.” FSE 2016
Decan et al. “On the topology of package dependency networks – A comparison of three programming language ecosystems.” WEA 2016
Research ChallengeBig Data
Volume Velocity
Variety Veracity
4V
Research Challenge
Privacy Reproducibility
Research ChallengePrivacy vs reproducibility
How to preserve privacy of individuals?– EU 2016/679 regulation on the protection of natural
persons with regard to the processing of personal data and on the free movement of such data
“The principles of data protection should apply to any information concerning an identified or identifiable natural person. “
– Appropriate anonimisation and privacy-preserving data mining techniques needed
Fung et al. Privacy-preserving data publishing: A survey of recent developments. ACM Computing Surveys 2010
Malik et al. Privacy preserving data mining techniques: Current scenario and future prospects. IC3T 2012
Research ChallengePrivacy vs reproducibility
• Increase/ensure reproducible research results– Awareness is increasing– Solutions are being put into place– Big data problems remain an issue
• How to reconcile privacy with reproducibility?
Gonzalez-Barahona et al. On the reproducibility of empirical software engineering studies based on data retrieved from development repositories. Emp. Softw. Eng. 2012
Wrap-up
Research on SECO evolution requires– A socio-technical view– Mixed method research– Interdisciplinary research
Many technical challenges need to be faced
Are you willing to take up the challenge?