Usage and attribution of Stack Overflow code snippets in GitHub projects

被引:48
|
作者
Baltes, Sebastian [1 ]
Diehl, Stephan [1 ]
机构
[1] Univ Trier, Software Engn Grp, Trier, Germany
关键词
Code snippets; Licensing; Stack Overflow; GitHub; Online survey; Mining software repositories; REUSE;
D O I
10.1007/s10664-018-9650-5
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of copyable code snippets. Using those snippets raises maintenance and legal issues. SO's license (CC BY-SA 3.0) requires attribution, i.e., referencing the original question or answer, and requires derived work to adopt a compatible license. While there is a heated debate on SO's license model for code snippets and the required attribution, little is known about the extent to which snippets are copied from SO without proper attribution. We present results of a large-scale empirical study analyzing the usage and attribution of non-trivial Java code snippets from SO answers in public GitHub (GH) projects. We followed three different approaches to triangulate an estimate for the ratio of unattributed usages and conducted two online surveys with software developers to complement our results. For the different sets of projects that we analyzed, the ratio of projects containing files with a reference to SO varied between 3.3% and 11.9%. We found that at most 1.8% of all analyzed repositories containing code from SO used the code in a way compatible with CC BY-SA 3.0. Moreover, we estimate that at most a quarter of the copied code snippets from SO are attributed as required. Of the surveyed developers, almost one half admitted copying code from SO without attribution and about two thirds were not aware of the license of SO code snippets and its implications.
引用
收藏
页码:1259 / 1295
页数:37
相关论文
共 50 条
  • [21] Code2Que: A tool for improving question titles from mined code snippets in stack overflow
    Gao, Zhipeng
    Xia, Xin
    Lo, David
    Grundy, John
    Li, Yuan-Fang
    ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021, : 1525 - 1529
  • [22] DICOS: Discovering Insecure Code Snippets from Stack Overflow Posts by Leveraging User Discussions
    Hong, Hyunji
    Woo, Seunghoon
    Lee, Heejo
    37TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2021, 2021, : 194 - 206
  • [23] Student Experiences with GitHub and Stack Overflow: An Exploratory Study
    Bhasin, Trishala
    Murray, Adam
    Storey, Margaret-Anne
    2021 IEEE/ACM 13TH INTERNATIONAL WORKSHOP ON COOPERATIVE AND HUMAN ASPECTS OF SOFTWARE ENGINEERING (CHASE 2021), 2021, : 81 - 90
  • [24] Code Duplication on Stack Overflow
    Baltes, Sebastian
    Treude, Christoph
    2020 IEEE/ACM 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: NEW IDEAS AND EMERGING RESULTS (ICSE-NIER 2020), 2020, : 13 - 16
  • [25] Code Reuse in Stack Overflow and Popular Open Source Java']Java Projects
    Lotter, Adriaan
    Licorish, Sherlock A.
    Savarimuthu, Bastin Tony Roy
    Meldrum, Sarah
    2018 25TH AUSTRALASIAN SOFTWARE ENGINEERING CONFERENCE (ASWEC), 2018, : 141 - 150
  • [26] Gistable: Evaluating the Executability of Python']Python Code Snippets on GitHub
    Horton, Eric
    Parnin, Chris
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2018, : 217 - 227
  • [27] Geek Talents: Who are the Top Experts on GitHub and Stack Overflow?
    Tian, Yijun
    Ng, Waii
    Cao, Jialiang
    McIntosh, Suzanne
    CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 61 (02): : 465 - 479
  • [28] Automatic title completion for Stack Overflow posts and GitHub issues
    Chen, Xiang
    Pei, Wenlong
    Yang, Shaoyu
    Zhou, Yanlin
    Zhang, Zichen
    Pei, Jiahua
    EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (05)
  • [29] Studying Software Developer Expertise and Contributions in Stack Overflow and GitHub
    Vadlamani, Sri Lakshmi
    Baysal, Olga
    2020 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2020), 2020, : 312 - 323
  • [30] Understanding Stack Overflow Code Fragments
    Treude, Christoph
    Robillard, Martin P.
    2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME), 2017, : 509 - 513