Post on 12-Apr-2017
Multilevel Collaboration between Software Developers and the Impact of Proximity:
an Early, Preliminary Work
Dawn Foster, Guido Conaldi, Riccardo De Vita University of Greenwich
Centre for Business Network Analysis http://www.gre.ac.uk/business/research/centres/cbna/home
Goals for Today Very early work – seeking feedback on • Best approaches for incorporating
multilevel concepts. • Fitting a suitable model for multilevel
networks. • What we have done so far.
2
Research Overview How do participants who are paid by firms collaborate within a fluid organization? Proximity theory as a theoretical framework: • to understand intraorganizational collaboration • within fluid organizations • using an open source software project, the
Linux kernel, as the empirical setting. 3
Contributions Contribute to literature on fluid organizations by: • Determining the impact of firm affiliation on intraorganizational
collaboration between individuals in fluid organizations. – Existing studies on open source mostly individual motivations. – Firms can influence collaboration of employees.
• Demonstrating that proximity theory can be used to better understand collaboration within fluid organizations. – Boschma’s (2005) five dimensions should further our understanding. – Most proximity studies are inter; Fluid boundaries blur distinction.
As fluid organizations become more common, understanding collaboration within them is increasingly important. 4
Fluid Organizations • In fluid organizations, the boundaries and structures allow fluid
movement within the organization as individuals collaborate to coordinate activities (Ashkenas et al., 2002; Glance & Huberman, 1994).
• Some fluid organizations are based on global virtual work across many
time zones by people from different backgrounds (Nurmi & Hinds, 2016) and may include individuals from different firms and different types of institutions (O’Mahony & Bechky, 2008).
• Collaboration, especially within fluid organizations, crosses dimensions of proximity, including cognitive, organizational, social, institutional and geographical, which can be used to better understand collaboration (Balland, 2012; Boschma, 2005; Cantner & Graf, 2006; Crescenzi, Nathan, & Rodríguez-Pose, 2016; Knoben & Oerlemans, 2006).
5
Proximity Theory • Social proximity: relations between actors with trust coming from friendship and
experience (Boschma 2005).
• Institutional proximity: whether individuals collaborate more with others in a similar institutional setting, like corporation, non-profit, university, non-affiliated, etc. (Balland 2012; Crescenzi et al. 2013).
• Organizational proximity: relationship within an organizational structure (Boschma 2005) and to look at collaboration within and between orgs.
• Cognitive proximity: similarity of frames of reference and knowledge (Knoben & Oerlemans 2006).
• Geographic Proximity: physical, spatial distance between actors (Boschma 2005). Online, geographical proximity is often irrelevant, but others have used a temporal measure (time zones) (O’Leary & Cummings, 2007).
6
Empirical Setting: Open Source • Open source frequently studied as a fluid organization (e.g. Chen
& O’Mahony, 2009; O'Mahony & Bechky, 2008; Puranam et al., 2014)
• Contributions by individuals, not firms (O’Mahony, 2007), but firms are increasingly paying employees to contribute as a way to participate (Jensen & Scacchi, 2007; Roberts et al., 2006).
• Linux Kernel1: – < 8% of contributions by
unpaid software developers – Neutral project, competing
companies participate – 22 million lines of code – 14,000 developers – 1,300 organisations
7
Linux Kernel
Computer Hardware (CPU, memory, disk)
Linux Operating System (Red Hat, Ubuntu)
Applications (web browser, office) S
yste
m o
nly
Use
r fac
ing
1 Corbet & Kroah-Hartman, 2016
Collaboration Network • Network ties: Mailing Lists – ego replies to alter
– Collaboration for code review, patch feedback, bugs & discussions are on mailing lists before source code is accepted into repository.
• “The mailing lists are still the primary communications space.” • “All of our collaboration happens over discussing patches.”
8 10 Mailing Lists 2015-01-27 90 days k-core>=10
Multilevel Network • Individual / Organizational / Mailing List Levels
– Employers pay developers to enable firm’s products, gain influence and set direction, share information, more.
– Most consider affiliation with the Linux kernel community to be more important than their employer.
– Almost all contributions come from paid software developers. – Collaboration occurs in 200+ mailing lists simultaneously.
• How does firm affiliation with an organization shape collaboration of individuals?
• How do mailing lists enable collaboration?
9
Operationalizing Proximity Using Boschma’s (2005) 5 dimensions of proximity • Organizational:
– Operationalized as firm affiliation (company) or unaffiliated (hobbyist, etc.) • Cognitive:
– Usually measured based on shared knowledge / technologies – Operationalized as contributing to areas of the source code (subsystems)
• Geographic: – Usually measured based on physical location, less relevant for online
collaboration. – Operationalized using time zones (temporal geographic proximity)
• Institutional: – Operationalized based on employment by firm, academia, or unaffiliated
• Social: – Often measured using collaboration network (seems like double counting) – Operationalized by # of times dyad participated in same mailing list threads 10
Dataset • Subset for testing multilevel analysis – 2 years • Dates:
– 2013-11-01 (complete dataset: 2006-03-20 first LTS release) – 2015-11-01 – date of 4.3 release – 15, 30, 45, 60, 75, 90 day moving windows
• Mailing Lists: – 19 of the top mailing lists (over 200), excluded top mailing list – 226,919 messages (out of 2,818,774 for top 20, all dates)
• Source Code: – Linux-stable tree – 177,113 commits (out of 603,006 for all dates)
11
Relational Event Models • Relational event models provide a “highly flexible framework for
modeling actions within social settings, which permits likelihood-based inference for behavioral mechanisms with complex dependence.” (Butts, 2008, p. 155)
• Based on relational events, or actions generated by sender directed toward a receiver. Represented by sender, receiver, action type and time (Butts, 2008).
• Mailing list data with a time stamp for each message provides useful data for relational event models.
• Each reply to a mailing list post can be thought of as an event created by a sender targeted at a receiver.
• Used to explain likelihood of collaboration between 2 developers given influence of dimensions of proximity and other effects.
12
Results - Series of difficulties • REM model struggled with number of events:
– Reduced to first 500 events (1.5 days) to get the model to run (used first 200 events as control, ran model with 300 events)
– Takes 6+ hours to estimate 600 events (3 days) on a big server. – Might have to do with the way we are loading variables into the
model. – Possible other limitations with the REM model / Relevent software
14
Preliminary Results • Model not yet complete: Testing the waters now.
– tiny number of events won’t represent whole. – missing variables likely to change these results. – need to analyze per mailing list (mailing list level)
• Proximity looks promising as theoretical framework – Org prox - less likely to reply to other employees. Do they
use internal corporate channels to collaborate? – Cognitive prox – more likely to reply to people working in
same areas of code. – Geo prox – less likely to reply as tz difference increases
16
Future Developments / Feedback • We know the Model has issues:
– Get feedback on what we have done so far and on fitting a suitable model for multilevel networks.
• Multilevel: Both aspects need to be developed: – Multilevel analysis of networks: multiple mailing lists at the same
time (like classrooms within schools) • Mailing lists as levels? How do we do this?
– Analysis of multilevel networks: complex models for networks - modeling organizational affiliation as a level.
• Can we treat organizations as a level, instead of as an attribute of developers? • Need to look at org level to see interactions by organization.
• Relational Event Models: – Options for modeling large event sequences in networks. 17