Usage and attribution of Stack Overflow code snippets in GitHub projects

被引:48
|
作者
Baltes, Sebastian [1 ]
Diehl, Stephan [1 ]
机构
[1] Univ Trier, Software Engn Grp, Trier, Germany
关键词
Code snippets; Licensing; Stack Overflow; GitHub; Online survey; Mining software repositories; REUSE;
D O I
10.1007/s10664-018-9650-5
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of copyable code snippets. Using those snippets raises maintenance and legal issues. SO's license (CC BY-SA 3.0) requires attribution, i.e., referencing the original question or answer, and requires derived work to adopt a compatible license. While there is a heated debate on SO's license model for code snippets and the required attribution, little is known about the extent to which snippets are copied from SO without proper attribution. We present results of a large-scale empirical study analyzing the usage and attribution of non-trivial Java code snippets from SO answers in public GitHub (GH) projects. We followed three different approaches to triangulate an estimate for the ratio of unattributed usages and conducted two online surveys with software developers to complement our results. For the different sets of projects that we analyzed, the ratio of projects containing files with a reference to SO varied between 3.3% and 11.9%. We found that at most 1.8% of all analyzed repositories containing code from SO used the code in a way compatible with CC BY-SA 3.0. Moreover, we estimate that at most a quarter of the copied code snippets from SO are attributed as required. Of the surveyed developers, almost one half admitted copying code from SO without attribution and about two thirds were not aware of the license of SO code snippets and its implications.
引用
收藏
页码:1259 / 1295
页数:37
相关论文
共 50 条
  • [1] Usage and attribution of Stack Overflow code snippets in GitHub projects
    Sebastian Baltes
    Stephan Diehl
    Empirical Software Engineering, 2019, 24 : 1259 - 1295
  • [2] Attribution Required: Stack Overflow Code Snippets in GitHub Projects
    Baltes, Sebastian
    Kiefer, Richard
    Diehl, Stephan
    PROCEEDINGS OF THE 2017 IEEE/ACM 39TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C 2017), 2017, : 161 - 163
  • [3] Stack Overflow in Github: Any Snippets There?
    Yang, Di
    Martins, Pedro
    Saini, Vaibhav
    Lopes, Cristina
    2017 IEEE/ACM 14TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2017), 2017, : 280 - 290
  • [4] Studying the Change Histories of Stack Overflow and GitHub Snippets
    Manes, Saraj Singh
    Baysal, Olga
    2021 IEEE/ACM 18TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2021), 2021, : 283 - 294
  • [5] Toxic Code Snippets on Stack Overflow
    Ragkhitwetsagul, Chaiyong
    Krinke, Jens
    Paixao, Matheus
    Bianco, Giuseppe
    Oliveto, Rocco
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (03) : 560 - 581
  • [6] From Query to Usable Code: An Analysis of Stack Overflow Code Snippets
    Yang, Di
    Hussain, Aftab
    Lopes, Cristina Videira
    13TH WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2016), 2016, : 391 - 401
  • [7] Identifying versions of libraries used in stack overflow code snippets
    Zerouali, Ahmed
    Velazquez-Rodriguez, Camilo
    De Roover, Coen
    Proceedings - 2021 IEEE/ACM 18th International Conference on Mining Software Repositories, MSR 2021, 2021, : 341 - 345
  • [8] Identifying Versions of Libraries used in Stack Overflow Code Snippets
    Zerouali, Ahmed
    Velazquez-Rodriguez, Camilo
    De Roover, Coen
    2021 IEEE/ACM 18TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2021), 2021, : 341 - 345
  • [9] Mining the Usage of Reactive Programming APIs: A Study on GitHub and Stack Overflow
    Zimmerle, Carlos
    Gama, Kiev
    Castor, Fernando
    Filho, Jose Murilo Mota
    Proceedings - 2022 Mining Software Repositories Conference, MSR 2022, 2022, : 203 - 214
  • [10] Augmenting Stack Overflow with API Usage Patterns Mined from GitHub
    Reinhardt, Anastasia
    Zhang, Tianyi
    Mathur, Mihir
    Kim, Miryung
    ESEC/FSE'18: PROCEEDINGS OF THE 2018 26TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2018, : 880 - 883