By Chris Zachor. Introduction Background Changes Methodology Data Collection Network...
-
Upload
jodie-lyons -
Category
Documents
-
view
228 -
download
0
Transcript of By Chris Zachor. Introduction Background Changes Methodology Data Collection Network...
![Page 1: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/1.jpg)
SOFTWARE COLLABORATION
NETWORKSBy Chris Zachor
![Page 2: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/2.jpg)
Overview
Introduction Background
Changes Methodology
Data Collection Network Topologies Measures Tools
Conclusion Questions
![Page 3: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/3.jpg)
Introduction
Use network analysis to better understand the SourceForge and Github community developers
Identify key differences (if any) within the two communities
Examine the diversity of collaborations within these two communities
![Page 4: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/4.jpg)
Changes
The addition of Github to the study Contains some of the same attributes to
allow for a comparison
Other communities were looked at, but they either were not large enough or did not provide enough public data.
![Page 5: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/5.jpg)
Data Collection
Crawling the websites using a simple Perl script and regular expressions
Collect a project list from Sourceforge www.sourceforge.net/projects/projectTitle No specified request limit Check for duplicates
![Page 6: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/6.jpg)
Sourceforge Project Page
![Page 7: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/7.jpg)
Github Crawling
Using the Github API provides our data Limited to 60 API calls per minute Use multiple computers to collect all 1.5
million projects
![Page 8: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/8.jpg)
Github Project Page
![Page 9: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/9.jpg)
Github API
![Page 10: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/10.jpg)
Developer/Project Network
![Page 11: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/11.jpg)
Project-Developer Network
![Page 12: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/12.jpg)
Measures and Metrics
Degree Clustering Coeficient Modularity Power Law Small World Phenomenon
![Page 13: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/13.jpg)
Degree
Average number of projects worked on by a developer
Average number of collaborations Average number of developers on a
project
![Page 14: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/14.jpg)
Clustering Coeficient
Examine how likely developers are to stick together in groups
Examine both average clustering coefficient for the entire network and the local clustering coefficient for nodes of interest
![Page 15: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/15.jpg)
Modularity
Provide us with a measure of how diverse developer collaborations are.
Range -1 < Q < 1 Ranges closer to one show less diversity
in collaboration choices Ranges closer to negative one show more
diversity in collaboration choices
![Page 16: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/16.jpg)
Power Law
Previous studies have found that the Sourceforge community does follow the power law
No such study has been done on the Github community
Fewer developers should be apart of many project while many developers should be involved with only one project
![Page 17: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/17.jpg)
Small World Phenomenon
Previous studies have shown the Sourceforge community does exhibit small world properties
Once again, no study has been done on the Github community
Using Pajek, I will create a random network of the same nodes and edges
Then, compare the clustering coefficient and the average shortest path
![Page 18: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/18.jpg)
Tools
Perl Pajek cURL wget GUESS
![Page 19: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/19.jpg)
Conclusion
Through the use of network analysis, we hope to gain a better understanding of the developers of Sourceforge and Github communities.
![Page 20: By Chris Zachor. Introduction Background Changes Methodology Data Collection Network Topologies Measures Tools Conclusion Questions.](https://reader035.fdocuments.us/reader035/viewer/2022062217/56649e9d5503460f94b9ed27/html5/thumbnails/20.jpg)
Questions?
Suggestions?Comments?