Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 1
Exploring Core-Periphery Structures in Complex Software Products
Carliss Baldwin (HBS)
Alan MacCormack (MIT), John Rusnak (HBS)
Drexel University
Philadelphia, May 2009
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 2
Architecture and Intellectual Property
Design for capturing/defending value, not for collaboration!
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 3Slide 3 © Carliss Y. Baldwin 2008
Platform Component of a Java ServerLaMantia et. al. (WICSA 2008)
Used licensed- in
codeThe license was about to
expire
Creating a classic holdup
problem
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 4Slide 4 © Carliss Y. Baldwin 2008
They created a “thin crossing point” between their code and the licensed code
Before Modulariziation After Modulariziation
Licensed Code
No Depend- encies
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 5Slide 5 © Carliss Y. Baldwin 2008
Thin crossing points <=>Low transaction costs
The presence/absence or propensity to have thin crossing points
<==> modular structure of the design
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 6
Conceptual Background
• Much academic work suggests that complex technical systems possess a “Core-Periphery” structure– Core = tightly-coupled components, central to system operation– Periphery = loosely-coupled components, optional/non-critical
• Little empirical work explores the extent to which such structures are observed in practice; or those factors which influence the size, nature and evolution of these structures
• Our Aim: Develop a system to reveal the Core-Periphery structure of real software systems; analyze large sample
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 7
The Intuition: Core-Periphery Structures
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 8
Distribution of Coupling Metrics
Measures of Visibility have a Bi-Modal (or Multi-Modal) Distribution
Number of Direct Dependencies has an Exponential Distribution
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 9
Defining the Core: The “Spectrum Plot”
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 10
A DSM in Core-Periphery (CP) View
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 11
Key Questions
• Do all systems have a Core-Periphery structure; can we predict those that do?
• How large is the Core; what factors predict whether the Core is large/small?
• Are Core Components located in close proximity or distributed about the system?
• What happens to the size of the Core over time; does it remain stable or grow?
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 12
Empirical Approach: Analysis of over 1,000 Software Systems using DSAS
• Darwin
• “MyBooks” (Disguised)
• Abiword
• Apache
• BDB
• Chrome
• Calc (Open Office)
• Ghostscript
• Gnucash
• Gnumeric
• Linux
• Mozilla
• MySQL
• Open AFS
• Open Office (All)
• Open Solaris
• PostGres
• Write (Open Office)
• XNU
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 13
Key Findings• About 2/3rds of these systems have Core-Periphery structure
– Remainder may have “No Core” or have “Multiple-Cores”
• Cannot always tell if a system is Core-Periphery from DSM– Direct dependencies is insufficient; Pattern of dependencies is key
• “Core” sizes vary significantly; from zero to thousands– Large variations, even for systems that “do the same thing”– Aligned with Open and Closed organizational choices (see Conway)
• Core Components are NOT collocated, tend to be distributed– Designers may be unaware which components are Core
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 14
2/3rds of Systems are Core-Periphery
Put two clear examples of Core-Periphery systems here
Notes: Biased sample; systems do change over time (e.g., Linux)
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 15
One small core, one larger (Linux 2.1.88 vs Mozilla; both 1500 files)2/3rds of Systems are Core-Periphery
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 16
Some Systems are “Multi-Core”…
Open Office Spectrum Plots — Note matching V-
fan-ins and V-fan-outs
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 17
Open Office v1.0
Database
Write word processor
Calc spreadsheet
Graphics system
Presentations, charts, drawing
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 18
…with Modules that are Core-Periphery
Open OfficeCalc Subsystem
Architect’sView
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 19
…with Modules that are Core-Periphery
Open OfficeCalc SubsystemCore-Periphery
View
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 20
Open Office v1.0—Core-Periphery View
Core-Periphery Analysis iterated on the Modules Calc and Write
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 21
Some Systems have no Core: Gnucash
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 22
This is NOT apparent from the DSM
Implications for
Code Architects!
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 23
Core Components are Distributed: Can be Difficult to Identify
My Books—Core files distributed throughout the system
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 24
Systems of Similar Size can havevery different Core sizes
Release Size V-FanIn-Only V-FanOut-Only V-Both V-Otherlinux_1_1_23 301 7 2% 196 65% 39 13% 59 20%db_4_1_24 305 19 6% 124 41% 107 35% 55 18%gnumeric_1_1_18 355 35 10% 102 29% 159 45% 59 17%postgresql_6_2_1 370 29 8% 85 23% 209 56% 47 13%linux_1_1_92 400 11 3% 250 63% 38 10% 101 25%
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 25
Systems of Similar Size and Function often have very different Core sizes
Linux Open SolarisThe “Core”
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 26
Spectrum ComparisonsLinux Open Solaris
Note: Very Different Sizes!
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 27
Systems of Similar Size and Function often have very different Core sizes
Gnucash (no Core) My Books (70% Core)
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 28
Core Sizes Evolve Differently:Sometimes they are Stable in Size
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 29
Core Sizes Evolve Differently:Sometimes they Grow at a Linear rate
Linux Core Size
0
100
200
300
400
500
600
0 2000 4000 6000 8000 10000
System size in source files
Nu
mb
er o
f V
_Bo
th
file
s
0%
5%
10%
15%
20%
25%
30%
Sys
tem
per
cetn
of
V_B
oth
fil
es
Number of V_Both files System percent of V_Both files
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 30
The Challenge of an Increasing Core…
Release Size V-FanIn-Only V-FanOut-Only V-Both V-Otheron_src_b36 12105 458 4% 6629 55% 2892 24% 2126 18%on_src_b37 12149 455 4% 6650 55% 2917 24% 2127 18%on_src_b38 12330 458 4% 6813 55% 2921 24% 2138 17%on_src_b39 12336 460 4% 6815 55% 2924 24% 2137 17%on_src_b40 12343 462 4% 6821 55% 2925 24% 2135 17%on_src_b41 12404 485 4% 6803 55% 3001 24% 2115 17%on_src_b42 12407 485 4% 6811 55% 3002 24% 2109 17%on_src_b45 12568 495 4% 6877 55% 3081 25% 2115 17%on_src_b46 12567 495 4% 6876 55% 3081 25% 2115 17%on_src_b47 12550 493 4% 6865 55% 3075 25% 2117 17%on_src_b48 12573 492 4% 6878 55% 3079 24% 2124 17%on_src_b49 12644 491 4% 6932 55% 3099 25% 2122 17%on_src_b50 12722 493 4% 6945 55% 3162 25% 2122 17%on_src_b51 12732 491 4% 6993 55% 3121 25% 2127 17%on_src_b52 12732 491 4% 6990 55% 3124 25% 2127 17%on_src_b53 12771 492 4% 7003 55% 3147 25% 2129 17%on_src_b54 12794 493 4% 7015 55% 3160 25% 2126 17%on_src_b55 12794 493 4% 7015 55% 3160 25% 2126 17%on_src_b56 12832 495 4% 7044 55% 3167 25% 2126 17%on_src_b57 12838 499 4% 7049 55% 3168 25% 2122 17%on_src_b58 12859 499 4% 7065 55% 3170 25% 2125 17%on_src_b59 12860 499 4% 7067 55% 3169 25% 2125 17%on_src_b60 12861 499 4% 7071 55% 3168 25% 2123 17%on_src_b61 12897 500 4% 7104 55% 3171 25% 2122 16%on_src_b62 12927 500 4% 7119 55% 3181 25% 2127 16%on_src_b63 12935 498 4% 7127 55% 3183 25% 2127 16%on_src_b65 12949 497 4% 7142 55% 3184 25% 2126 16%
The Core of Solaris
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 31
Core Sizes can Exhibit Discontinuities:E.g., The Evolution of Linux
Release Size V-FanIn-Only V-FanOut-Only V-Both V-Otherlinux_1_0 282 10 4% 186 66% 31 11% 55 20%linux_1_1_0 282 10 4% 185 66% 31 11% 56 20%linux_1_2_0 400 11 3% 249 62% 41 10% 99 25%linux_1_3_0 431 12 3% 256 59% 42 10% 121 28%linux_2_0 779 16 2% 520 67% 73 9% 170 22%linux_2_1_0 785 19 2% 526 67% 70 9% 170 22%linux_2_2_0 1891 25 1% 1368 72% 79 4% 419 22%linux_2_3_0 1946 32 2% 1292 66% 75 4% 547 28%linux_2_4_0 3243 46 1% 2513 77% 211 7% 473 15%linux_2_5_0 4047 98 2% 2807 69% 259 6% 883 22%linux_2_6_0 6194 156 3% 4395 71% 405 7% 1238 20%
Q: Did IBM’s Major Code Contributions start in Linux 2.4.0?
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 32
Core Sizes can be Influenced—Mozilla before and after Redesign
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 33
Core sizes as a percent of system size
1275 total systems
248 have VBoth=0
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 34
Larger Open Source SystemsHave Smaller Relative Core Sizes
Open Solaris
Exploring Core-Periphery Structures
©Alan MacCormack, John Rusnak, Carliss Baldwin 2009 35
Conclusions
• Developed a method to extract Core-Periphery structures from Software and analyzed over 1,000 Software Systems– 2/3rds of these systems have a single Core– Some have no Core and others have Multi-Cores
• Difficult to tell from DSM if a system has zero, one or more Cores; difficult to tell which components are in the Core
• Core sizes:– Cross-section: Vary significantly - organization structure matters– Longitudinal: Evolve differently – can be influenced by redesigns
Top Related