A survey on the use of topic models when mining software repositories

被引:0
|
作者
Tse-Hsun Chen
Stephen W. Thomas
Ahmed E. Hassan
机构
[1] Queen’s University,Software Analysis and Intelligence Lab (SAIL)
来源
关键词
Topic modeling; LDA; LSI; Survey;
D O I
暂无
中图分类号
学科分类号
摘要
Researchers in software engineering have attempted to improve software development by mining and analyzing software repositories. Since the majority of the software engineering data is unstructured, researchers have applied Information Retrieval (IR) techniques to help software development. The recent advances of IR, especially statistical topic models, have helped make sense of unstructured data in software repositories even more. However, even though there are hundreds of studies on applying topic models to software repositories, there is no study that shows how the models are used in the software engineering research community, and which software engineering tasks are being supported through topic models. Moreover, since the performance of these topic models is directly related to the model parameters and usage, knowing how researchers use the topic models may also help future studies make optimal use of such models. Thus, we surveyed 167 articles from the software engineering literature that make use of topic models. We find that i) most studies centre around a limited number of software engineering tasks; ii) most studies use only basic topic models; iii) and researchers usually treat topic models as black boxes without fully exploring their underlying assumptions and parameter values. Our paper provides a starting point for new researchers who are interested in using topic models, and may help new researchers and practitioners determine how to best apply topic models to a particular software engineering task.
引用
收藏
页码:1843 / 1919
页数:76
相关论文
共 50 条
  • [31] Changeset-Based Topic Modeling of Software Repositories
    Corley, Christopher S.
    Damevski, Kostadin
    Kraft, Nicholas A.
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2020, 46 (10) : 1068 - 1080
  • [32] Exploring Topic Models in Software Engineering Data Analysis: A Survey
    Sun, Xiaobing
    Liu, Xiangyue
    Li, Bin
    Duan, Yucong
    Yang, Hui
    Hu, Jiajun
    2016 17TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2016, : 357 - 362
  • [33] Introduction to the special issue on mining software repositories
    Tao Xie
    Thomas Zimmermann
    Arie van Deursen
    Empirical Software Engineering, 2013, 18 : 1043 - 1046
  • [34] Mining expertise of developers from software repositories
    Hammad, Maen
    Hijazi, Haneen
    Hammad, Mustafa
    Otoom, Ahmed Fawzi
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2020, 62 (03) : 227 - 239
  • [35] Introduction to the special issue on mining software repositories
    Xie, Tao
    Zimmermann, Thomas
    van Deursen, Arie
    EMPIRICAL SOFTWARE ENGINEERING, 2013, 18 (06) : 1043 - 1046
  • [36] Guest editorial: Mining software repositories 2018
    Kamei, Yasutaka
    Zaidman, Andy
    EMPIRICAL SOFTWARE ENGINEERING, 2020, 25 (03) : 2055 - 2057
  • [37] Guest editorial: Mining software repositories 2018
    Yasutaka Kamei
    Andy Zaidman
    Empirical Software Engineering, 2020, 25 : 2055 - 2057
  • [38] Mining Software Repositories to Identify Library Experts
    Santos, Adriano
    Souza, Mauricio
    Oliveira, Johnatan
    Figueiredo, Eduardo
    XII BRAZILIAN SYMPOSIUM ON SOFTWARE COMPONENTS, ARCHITECTURES, AND REUSE (SBCARS), 2018, : 83 - 91
  • [39] Mining Software Repositories with a Collaborative Heuristic Repository
    Babii, Hlib
    Prenner, Julian Aron
    Stricker, Laurin
    Karmakar, Anjan
    Janes, Andrea
    Robbes, Romain
    2021 ACM/IEEE 43RD INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: NEW IDEAS AND EMERGING RESULTS (ICSE-NIER 2021), 2021, : 106 - 110
  • [40] MetricMiner: Supporting Researchers in Mining Software Repositories
    Sokol, Francisco Zigmund
    Aniche, Mauricio Finavaro
    Gerosa, Marco Aurelio
    2013 IEEE 13TH INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM), 2013, : 142 - 146