Bug localization using latent Dirichlet allocation

被引:222
|
作者
Lukins, Stacy K. [1 ]
Kraft, Nicholas A. [2 ]
Etzkorn, Letha H. [1 ]
机构
[1] Univ Alabama, Dept Comp Sci, Huntsville, AL 35899 USA
[2] Univ Alabama, Dept Comp Sci, Tuscaloosa, AL 35487 USA
基金
美国国家科学基金会;
关键词
Bug localization; Program comprehension; Latent Dirichlet allocation; Information retrieval; DESIGN INSTABILITY;
D O I
10.1016/j.infsof.2010.04.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Context: Some recent static techniques for automatic bug localization have been built around modern information retrieval (IR) models such as latent semantic indexing (LSI). Latent Dirichlet allocation (LDA) is a generative statistical model that has significant advantages, in modularity and extensibility, over both LSI and probabilistic LSI (pLSI). Moreover, LDA has been shown effective in topic model based information retrieval. In this paper, we present a static LDA-based technique for automatic bug localization and evaluate its effectiveness. Objective: We evaluate the accuracy and scalability of the LDA-based technique and investigate whether it is suitable for use with open-source software systems of varying size, including those developed using agile methods. Method: We present five case studies designed to determine the accuracy and scalability of the LDA-based technique, as well as its relationships to software system size and to source code stability. The studies examine over 300 bugs across more than 25 iterations of three software systems. Results: The results of the studies show that the LDA-based technique maintains sufficient accuracy across all bugs in a single iteration of a software system and is scalable to a large number of bugs across multiple revisions of two software systems. The results of the studies also indicate that the accuracy of the LDA-based technique is not affected by the size of the subject software system or by the stability of its source code base. Conclusion: We conclude that an effective static technique for automatic bug localization can be built around LDA. We also conclude that there is no significant relationship between the accuracy of the LDA-based technique and the size of the subject software system or the stability of its source code base. Thus, the LDA-based technique is widely applicable. (C) 2010 Elsevier B.V. All rights reserved.
引用
收藏
页码:972 / 990
页数:19
相关论文
共 50 条
  • [21] Learning and Using Context on a Humanoid Robot Using Latent Dirichlet Allocation
    Celikkanat, Hande
    Orhan, Guner
    Pugeault, Nicolas
    Guerin, Frank
    Sahin, Erol
    Kalkan, Sinan
    FOUTH JOINT IEEE INTERNATIONAL CONFERENCES ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (IEEE ICDL-EPIROB 2014), 2014, : 201 - 207
  • [22] Topic modeling for expert finding using latent Dirichlet allocation
    Momtazi, Saeedeh
    Naumann, Felix
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2013, 3 (05) : 346 - 353
  • [23] Video fingerprinting using Latent Dirichlet Allocation and facial images
    Vretos, Nicholas
    Nikolaidis, Nikos
    Pitas, Ioannis
    PATTERN RECOGNITION, 2012, 45 (07) : 2489 - 2498
  • [24] Mining Sentiments from Songs Using Latent Dirichlet Allocation
    Sharma, Govind
    Murty, M. Narasimha
    ADVANCES IN INTELLIGENT DATA ANALYSIS X: IDA 2011, 2011, 7014 : 328 - 339
  • [25] Terminological ontology learning and population using latent Dirichlet allocation
    Colace, Francesco
    De Santo, Massimo
    Greco, Luca
    Amato, Flora
    Moscato, Vincenzo
    Picariello, Antonio
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2014, 25 (06): : 818 - 826
  • [26] Feature extraction for document text using Latent Dirichlet Allocation
    Prihatini, P. M.
    Suryawan, I. K.
    Mandia, I. N.
    2ND INTERNATIONAL JOINT CONFERENCE ON SCIENCE AND TECHNOLOGY (IJCST) 2017, 2018, 953
  • [27] Semantic Annotation of Satellite Images Using Latent Dirichlet Allocation
    Lienou, Marie
    Maitre, Henri
    Datcu, Mihai
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2010, 7 (01) : 28 - 32
  • [28] Latent Dirichlet Allocation for Classification using Gene Expression Data
    Yalamanchili, Hima Bindu
    Kho, Soon Jye
    Raymer, Michael L.
    2017 IEEE 17TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2017, : 39 - 44
  • [29] Identifying Top Listers in Alphabay Using Latent Dirichlet Allocation
    Grisham, John
    Barreras, Calvin
    Afarin, Cyran
    Patton, Mark
    Chen, Hsinchun
    IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS: CYBERSECURITY AND BIG DATA, 2016, : 219 - 219
  • [30] Obtaining Single Document Summaries Using Latent Dirichlet Allocation
    Nagesh, Karthik
    Murty, M. Narasimha
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT IV, 2012, 7666 : 66 - 74