The Perills of Doing Software Engineering Research using Github Data
-
Upload
dmgerman -
Category
Technology
-
view
87 -
download
1
description
Transcript of The Perills of Doing Software Engineering Research using Github Data
What is in github?
Daniel M [email protected]
Researcher states:
“40% of pull requests are not merged”
● Based on simply querying ghtorrent data● But it ignores what really happens● Many pull requests are merged without being marked as merged in github
● Ghtorrent data has many potential threats to validity
What is github used for?
"I store my presentations in github. I don't need USB stick anymore!"
Are there potential threats to validity for studies that assume github is about software engineering
only?
Methodology
● Reuse:– Surveys
– Data analysis for other papers
● Mixed methods:– Quantitative, and
– Qualitative
Uses:
Most projects are inactive
Social?
67% of projects a personal repos
95% have 3 or less committers
Self contained?
“Any serious project would have to have someseparate infrastructure - mailing lists, forums, ircchannels and their archives, build farms, etc. [...]Thus while GitHub and all other project hosts areused for collaboration, they are not and can not
be a complete solution.”
But.. what about the users?
Switch to http://osrc.dfm.io/dmgerman
Is it still worth exploring github?
Definitely!