A survey on the use of topic models when mining software repositories

被引:0
|
作者
Tse-Hsun Chen
Stephen W. Thomas
Ahmed E. Hassan
机构
[1] Queen’s University,Software Analysis and Intelligence Lab (SAIL)
来源
关键词
Topic modeling; LDA; LSI; Survey;
D O I
暂无
中图分类号
学科分类号
摘要
Researchers in software engineering have attempted to improve software development by mining and analyzing software repositories. Since the majority of the software engineering data is unstructured, researchers have applied Information Retrieval (IR) techniques to help software development. The recent advances of IR, especially statistical topic models, have helped make sense of unstructured data in software repositories even more. However, even though there are hundreds of studies on applying topic models to software repositories, there is no study that shows how the models are used in the software engineering research community, and which software engineering tasks are being supported through topic models. Moreover, since the performance of these topic models is directly related to the model parameters and usage, knowing how researchers use the topic models may also help future studies make optimal use of such models. Thus, we surveyed 167 articles from the software engineering literature that make use of topic models. We find that i) most studies centre around a limited number of software engineering tasks; ii) most studies use only basic topic models; iii) and researchers usually treat topic models as black boxes without fully exploring their underlying assumptions and parameter values. Our paper provides a starting point for new researchers who are interested in using topic models, and may help new researchers and practitioners determine how to best apply topic models to a particular software engineering task.
引用
收藏
页码:1843 / 1919
页数:76
相关论文
共 50 条
  • [21] On Mining Data across Software Repositories
    Anbalagan, Prasanth
    Vouk, Mladen
    2009 6TH IEEE INTERNATIONAL WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES, 2009, : 171 - 174
  • [22] A process to mining issues of Software Repositories
    Bautista, Ana Maria
    San Feliu, Tomas
    2015 10TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI), 2015,
  • [23] Mining Software Repositories - A Comparative Analysis
    Olatunji, Sunday O.
    Idrees, Syed U.
    Al-Ghamdi, Yasser S.
    Al-Ghamdi, Jarallah Saleh Ali
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2010, 10 (08): : 161 - 174
  • [24] Guest Editorial: Mining software repositories
    Romain Robbes
    Yasutaka Kamei
    Martin Pinzger
    Empirical Software Engineering, 2017, 22 : 1143 - 1145
  • [25] Mining Software Repositories for Accurate Authorship
    Meng, Xiaozhu
    Miller, Barton P.
    Williams, William R.
    Bernat, Andrew R.
    2013 29TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (ICSM), 2013, : 250 - 259
  • [26] The challenges & case for mining software repositories
    Razzaq, Saad
    Maqbool, Fahad
    Anjum, Bilal
    Zafar, Samreen
    Laila, Umme
    Noor, Faiza
    IMECS 2007: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2007, : 734 - +
  • [27] Guest Editorial: Mining software repositories
    Robbes, Romain
    Kamei, Yasutaka
    Pinzger, Martin
    EMPIRICAL SOFTWARE ENGINEERING, 2017, 22 (03) : 1143 - 1145
  • [28] Mining Software Repositories for Social Norms
    Dam, Hoa Khanh
    Savarimuthu, Bastin Tony Roy
    Avery, Daniel
    Ghose, Aditya
    2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol 2, 2015, : 627 - 630
  • [29] Software Process Simulation based on Mining Software Repositories
    Honsel, Verena
    Honsel, Daniel
    Grabowski, Jens
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW), 2014, : 828 - 831
  • [30] Mining Software Code Repositories and Bug Databases using Survival Analysis Models
    Wedel, Michael
    Jensen, Uwe
    Goehner, Peter
    ESEM'08: PROCEEDINGS OF THE 2008 ACM-IEEE INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT, 2008, : 282 - +