1
Recovering Commit Dependencies for Selective Code Integration in
Software Product Lines
Tejinder Dhaliwal, Foutse Khomh, Ying Zou, Ahmed E. Hassan
2
Software Product Lines
Multiple Products
Production
Shared Components
3
Main Line Branching Model for Software Product Lines
Main Branch
Product-1 Branch
Product-2 Branch
Product-n Branch
Developers add new features
Integrators integrate selected features
4
Feature to Code Change Mapping
Developer adds Code Changes
Integrator integrates Features
Features Code Changes
FA CA1
FB CB1
Mapping facilitates selective
integration
5
Cost of Integration Failure • If change CA1 implements FA and
change CB1 implements FB
Feature Code Changes
FA CA1
FB CB1
Missing Dependencies
CA2 CB1
• If a change CA2 is added to modify FA and CA2 is dependent on CB1
CA1 CA2 CB1
CB1
CA1
Integrate FA
Integrate FB
530% more time
Product-1 Branch
Product-2 Branch
6
Our Solution
CA1 CB1CA2
Group dependent commits and propose to integrate a group as a whole
8
Commit Assignment Algorithm
Automated Grouping ( during
Integration)
Developer Guided Grouping ( during
Development)
Calibrate the Metrics on
Prior Versions
Our Approach to Group Dependent Commits
Define Dissimilarity
Metrics
9
Dissimilarity Metrics
Metric Description
File Dependency Distance (FD) Captures source code dependencies between files involved in two commits
File Association Distance (FA) Captures logical dependencies between files involved in two commits
Developer Dissimilarity Distance (DD) Captures the working relation between two developers submitting commits
CR Dependency Distance (CRD) Captures the dissimilarity between the CRs implemented by two commits
Given two commits characterized by files, developers and change requests (CRs)
10
Commit Assignment Algorithm
Automated Grouping ( during
Integration)
Developer Guided Grouping ( during
Development)
Calibrate the Metrics on
Prior Versions
Our Approach to Group Dependent Commits
Define Dissimilarity
Metrics
11
Calibrate Metrics on Prior Versions
For each of the four metrics - • Min_Threshold = Avg(a)• Max_Threshold = Avg(bmin)
• Silhouette= Avg{(bmin-a)/max(bmin,a)}
A higher silhouette value is better
a
b1
b2b3
12
Commit Assignment Algorithm
Automated Grouping ( during
Integration)
Developer Guided Grouping ( during
Development)
Calibrate the Metrics on
Prior Versions
Our Approach to Group Dependent Commits
Define Dissimilarity
Metrics
13
Commit Assignment Algorithm
Color > Shape
• Apply the similarity metrics in order of their precedence
• If no suitable group is found for a commit, assign the commit to a new group
14
Commit Assignment Algorithm
Automated Grouping ( during
Integration)
Developer Guided Grouping ( during
Development)
Calibrate the Metrics on
Prior Versions
Our Approach to Group Dependent Commits
Define Dissimilarity
Metrics
15
Commit Grouping ApproachesDeveloper-guided
Grouping
Automated Grouping
Groups commits incrementally and uses developers’ feedback to improve the grouping during development
Both approaches follow the k-means clustering method which consists in assigning each item to the cluster with the nearest mean.
16
Evaluation
We analyzed three major versions of a family of mobile applications
17
Evaluation Criteria• Validate the dissimilarity metrics
Can the proposed metrics be used to identify commit dependencies ?
• Validate the grouping approachesHow efficient are our proposed grouping approaches?
• Value for DevelopersCan the proposed approaches identify commit dependencies missed by developers ?
18
ResultsThe four similarity metrics display good abilities in grouping
commits ( i.e. high silhouette values)
Verion 1 Version 2 Version 30
0.2
0.4
0.6
0.8
1 0.94 0.96 0.96
0.760000000000001
0.790.6700000000000
010.63
0.670000000000001
0.57
0.46 0.47 0.49 CRDFADDFD
Sil
ho
uet
te V
alu
e
CRD > FA > DD > FD
19
Results
• Efficiency of the Grouping Approaches– 82% of commit dependencies were recovered by the
automated grouping with a precision of 95% – The accuracy of the developer-guided grouping
approach is 98%–We observed that precision/recall improves with
longer history data• Value for Developers– Automated grouping and Developer-guided grouping
approaches were able to reduce integration failures by 76% and 94% respectively
20
Summary
Top Related