A survey on the use of topic models when mining software repositories

被引:118
|
作者
Chen, Tse-Hsun [1 ]
Thomas, Stephen W. [1 ]
Hassan, Ahmed E. [1 ]
机构
[1] Queens Univ, SAIL, Kingston, ON, Canada
关键词
Topic modeling; LDA; LSI; Survey; INFORMATION-RETRIEVAL; FEATURE LOCATION; PROBABILISTIC RANKING; TRACEABILITY LINKS; EXECUTION; PREDICTION; COHESION; SYSTEM; COMBINATION; METRICS;
D O I
10.1007/s10664-015-9402-8
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Researchers in software engineering have attempted to improve software development by mining and analyzing software repositories. Since the majority of the software engineering data is unstructured, researchers have applied Information Retrieval (IR) techniques to help software development. The recent advances of IR, especially statistical topic models, have helped make sense of unstructured data in software repositories even more. However, even though there are hundreds of studies on applying topic models to software repositories, there is no study that shows how the models are used in the software engineering research community, and which software engineering tasks are being supported through topic models. Moreover, since the performance of these topic models is directly related to the model parameters and usage, knowing how researchers use the topic models may also help future studies make optimal use of such models. Thus, we surveyed 167 articles from the software engineering literature that make use of topic models. We find that i) most studies centre around a limited number of software engineering tasks; ii) most studies use only basic topic models; iii) and researchers usually treat topic models as black boxes without fully exploring their underlying assumptions and parameter values. Our paper provides a starting point for new researchers who are interested in using topic models, and may help new researchers and practitioners determine how to best apply topic models to a particular software engineering task.
引用
收藏
页码:1843 / 1919
页数:77
相关论文
共 50 条
  • [1] A survey on the use of topic models when mining software repositories
    Tse-Hsun Chen
    Stephen W. Thomas
    Ahmed E. Hassan
    Empirical Software Engineering, 2016, 21 : 1843 - 1919
  • [2] Mining Software Repositories Using Topic Models
    Thomas, Stephen W.
    2011 33RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2011, : 1138 - 1139
  • [3] A Survey on Mining Software Repositories
    Jung, Woosung
    Lee, Eunjoo
    Wu, Chisu
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (05): : 1384 - 1406
  • [4] MSR4SM: Using topic models to effectively mining software repositories for software maintenance tasks
    Sun, Xiaobing
    Li, Bixin
    Leung, Hareton
    Li, Bin
    Li, Yun
    INFORMATION AND SOFTWARE TECHNOLOGY, 2015, 66 : 1 - 12
  • [5] Mining software repositories for comprehensible software fault prediction models
    Vandecruys, Olivier
    Martens, David
    Baesens, Bart
    Mues, Christophe
    De Backer, Manu
    Haesen, Raf
    JOURNAL OF SYSTEMS AND SOFTWARE, 2008, 81 (05) : 823 - 839
  • [6] A survey and taxonomy of approaches for mining software repositories in the context of software evolution
    Kagdi, Huzefa
    Collard, Michael L.
    Maletic, Jonathan I.
    JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION-RESEARCH AND PRACTICE, 2007, 19 (02): : 77 - 131
  • [7] Mining software repositories
    1600, Japan Society for Software Science and Technology (30):
  • [8] Mining Open Software Repositories
    Alonso Abad, Jesus
    Lopez Nozal, Carlos
    Maudes Raedo, Jesus M.
    ERCIM NEWS, 2014, (99): : 23 - 24
  • [9] Ethics in the mining of software repositories
    Nicolas E. Gold
    Jens Krinke
    Empirical Software Engineering, 2022, 27
  • [10] Use and Misuse of the Term "Experiment" in Mining Software Repositories Research
    Ayala, Claudia
    Turhan, Burak
    Franch, Xavier
    Juristo, Natalia
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (11) : 4229 - 4248