A survey on the use of topic models when mining software repositories

被引:0
|
作者
Tse-Hsun Chen
Stephen W. Thomas
Ahmed E. Hassan
机构
[1] Queen’s University,Software Analysis and Intelligence Lab (SAIL)
来源
关键词
Topic modeling; LDA; LSI; Survey;
D O I
暂无
中图分类号
学科分类号
摘要
Researchers in software engineering have attempted to improve software development by mining and analyzing software repositories. Since the majority of the software engineering data is unstructured, researchers have applied Information Retrieval (IR) techniques to help software development. The recent advances of IR, especially statistical topic models, have helped make sense of unstructured data in software repositories even more. However, even though there are hundreds of studies on applying topic models to software repositories, there is no study that shows how the models are used in the software engineering research community, and which software engineering tasks are being supported through topic models. Moreover, since the performance of these topic models is directly related to the model parameters and usage, knowing how researchers use the topic models may also help future studies make optimal use of such models. Thus, we surveyed 167 articles from the software engineering literature that make use of topic models. We find that i) most studies centre around a limited number of software engineering tasks; ii) most studies use only basic topic models; iii) and researchers usually treat topic models as black boxes without fully exploring their underlying assumptions and parameter values. Our paper provides a starting point for new researchers who are interested in using topic models, and may help new researchers and practitioners determine how to best apply topic models to a particular software engineering task.
引用
收藏
页码:1843 / 1919
页数:76
相关论文
共 50 条
  • [41] Mining Software Repositories for Automatic Interface Recommendation
    Sun, Xiaobing
    Li, Bin
    Duan, Yucong
    Shi, Wei
    Liu, Xiangyue
    SCIENTIFIC PROGRAMMING, 2016, 2016
  • [42] Research on mining software repositories to facilitate refactoring
    Nyamawe, Ally S.
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 13 (05)
  • [43] Visual data mining and analysis of software repositories
    Voinea, Lucian
    Telea, Alexandru
    COMPUTERS & GRAPHICS-UK, 2007, 31 (03): : 410 - 428
  • [44] Manas: Mining Software Repositories to Assist AutoML
    Nguyen, Giang
    Islam, Md Johirul
    Pan, Rangeet
    Rajan, Hridesh
    2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 1368 - 1380
  • [45] Mining software repositories for software architecture - A systematic mapping study
    Soliman, Mohamed
    Albonico, Michel
    Malavolta, Ivano
    Wortmann, Andreas
    INFORMATION AND SOFTWARE TECHNOLOGY, 2025, 181
  • [46] Guest editorial: special section on mining software repositories
    Di Penta, Massimiliano
    Xie, Tao
    EMPIRICAL SOFTWARE ENGINEERING, 2015, 20 (02) : 291 - 293
  • [47] Introduction to the Special Issue on Mining Software Repositories in 2010
    Whitehead, Jim
    Zimmermann, Thomas
    EMPIRICAL SOFTWARE ENGINEERING, 2012, 17 (4-5) : 500 - 502
  • [48] Guest Editorial: Special Section on Mining Software Repositories
    Tan, Lin
    Hindle, Abram
    EMPIRICAL SOFTWARE ENGINEERING, 2019, 24 (03) : 1458 - 1460
  • [49] Guest Editorial: Special section on mining software repositories
    Romain Robbes
    Emily Hill
    Christian Bird
    Empirical Software Engineering, 2018, 23 : 833 - 834
  • [50] MSR 2004 - International Workshop on Mining Software Repositories
    Hassan, AE
    Holt, RC
    Mockus, A
    ICSE 2004: 26TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, PROCEEDINGS, 2004, : 770 - 771