Cregit Recovering token level authorship from Git
Transcript of Cregit Recovering token level authorship from Git
![Page 1: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/1.jpg)
cregit: Who Authored the Kernel?
Recovering Token-Level Authorship Information from Git
Daniel M GermanUniversity of Victoria
Kate StewartLinux Foundation
Bram AdamsPolytecnique of Montreal
![Page 2: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/2.jpg)
![Page 3: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/3.jpg)
Cincinnati LibraryImage in the public domain
![Page 4: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/4.jpg)
diff-able
![Page 5: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/5.jpg)
Image in the public domain
![Page 6: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/6.jpg)
![Page 7: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/7.jpg)
![Page 8: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/8.jpg)
“History is writen by the winners”
By Y. Karsh. Image in the public domain in Canada. Copyrighted in the US
![Page 9: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/9.jpg)
![Page 10: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/10.jpg)
“Archeology is the search for fact, not truth. If it's truth you're interested in, Dr. Tyree's Philosophy class is right down the hall.”
-- Indiana Jones
Image Copyright Walt Disney Company
![Page 11: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/11.jpg)
The history in git is likely to be incomplete
![Page 12: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/12.jpg)
Yet, what can we do with it?
![Page 13: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/13.jpg)
The Dream, by H. Rousseau. In the public domain.
![Page 14: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/14.jpg)
![Page 15: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/15.jpg)
![Page 16: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/16.jpg)
![Page 17: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/17.jpg)
![Page 18: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/18.jpg)
![Page 19: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/19.jpg)
![Page 20: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/20.jpg)
![Page 21: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/21.jpg)
![Page 22: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/22.jpg)
![Page 23: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/23.jpg)
![Page 24: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/24.jpg)
Image in the public domain
![Page 25: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/25.jpg)
Evolutionary Views of VC Repos
![Page 26: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/26.jpg)
![Page 27: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/27.jpg)
![Page 28: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/28.jpg)
![Page 29: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/29.jpg)
![Page 30: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/30.jpg)
![Page 31: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/31.jpg)
Per Line
Per Token
![Page 32: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/32.jpg)
![Page 33: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/33.jpg)
![Page 34: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/34.jpg)
Linux History
![Page 35: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/35.jpg)
Warning
● The author (git parlance) of a commit is not necessarily the author of the code
– Code imported from another source
– Refactorings
– Moving code
![Page 36: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/36.jpg)
Up to 4.7
Persons in blame:
Line: 12,005 Token: 12,087
![Page 37: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/37.jpg)
![Page 38: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/38.jpg)
![Page 39: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/39.jpg)
![Page 40: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/40.jpg)
![Page 41: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/41.jpg)
![Page 42: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/42.jpg)
![Page 43: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/43.jpg)
Token LineLinux
![Page 44: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/44.jpg)
Token Line
kernel/
![Page 45: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/45.jpg)
Many small changes
Non-merges that modified C and H files with respect to total of all commits
● 9.5 % of commits added 3 or less c-tokens and removed 3 or less c-tokens
● 7% of commits did not add any c-tokens but removed c-tokens
● 3.8% of commits added one c-token and removed one c-token
● 22.4% of commits added 10 or less c-tokens and removed 10 or less c-tokens
● 50% of commits added 60 or less c-tokens and removed 60 or less c-tokens
● 2 commits added at least 1M c-tokens and removed at least 1M c-tokens
![Page 46: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/46.jpg)
C-Churn
● Churn = C Tokens added – C tokens removed in non-merge commits
Non-merges that modified C and H files with respect to total of all commits
● 10% of commits had c-churn == 0
● 48% had c-churn <= 10
● 26% had negative c-churn
● 2 commits had c-churn >= 1M
![Page 47: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/47.jpg)
Conclusion
● On the large
– Token and Line areequivalent
● On the small
– Provide a fine grainedview of the evolutionof the code
![Page 48: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/48.jpg)
![Page 49: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/49.jpg)
![Page 50: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/50.jpg)
![Page 51: Cregit Recovering token level authorship from Git](https://reader031.fdocuments.us/reader031/viewer/2022030115/589d0adc1a28ab255c8b675f/html5/thumbnails/51.jpg)