The Impact of Classifier Configuration and Classifier Combination on Bug Localization

被引:74
|
作者
Thomas, Stephen W. [1 ]
Nagappan, Meiyappan [1 ]
Blostein, Dorothea [1 ]
Hassan, Ahmed E. [1 ]
机构
[1] Queens Univ, Sch Comp, Kingston, ON K7K 2N8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Software maintenance; bug localization; information retrieval; VSM; LSI; LDA; classifier combination; LOCATION;
D O I
10.1109/TSE.2013.27
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Bug localization is the task of determining which source code entities are relevant to a bug report. Manual bug localization is labor intensive since developers must consider thousands of source code entities. Current research builds bug localization classifiers, based on information retrieval models, to locate entities that are textually similar to the bug report. Current research, however, does not consider the effect of classifier configuration, i.e., all the parameter values that specify the behavior of a classifier. As such, the effect of each parameter or which parameter values lead to the best performance is unknown. In this paper, we empirically investigate the effectiveness of a large space of classifier configurations, 3,172 in total. Further, we introduce a framework for combining the results of multiple classifier configurations since classifier combination has shown promise in other domains. Through a detailed case study on over 8,000 bug reports from three large-scale projects, we make two main contributions. First, we show that the parameters of a classifier have a significant impact on its performance. Second, we show that combining multiple classifiers-whether those classifiers are hand-picked or randomly chosen relative to intelligently defined subspaces of classifiers-improves the performance of even the best individual classifiers.
引用
收藏
页码:1427 / 1443
页数:17
相关论文
共 50 条
  • [41] Pairwise classifier combination in the transferable belief model
    Quost, B
    Denoeux, T
    Masson, M
    2005 7TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), VOLS 1 AND 2, 2005, : 437 - 444
  • [42] CLASSIFIER COMBINATION BY BAYESIAN NETWORKS FOR HANDWRITING RECOGNITION
    De Stefano, Claudio
    D'elia, Ciro
    Di Freca, Alessandra Scotto
    Marcelli, Angelo
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2009, 23 (05) : 887 - 905
  • [43] Pairwise classifier combination using belief functions
    Quost, Benjamin
    Denoeux, Thierry
    Masson, Marie-Helene
    PATTERN RECOGNITION LETTERS, 2007, 28 (05) : 644 - 653
  • [44] Early Recognition of Sequential Patterns by Classifier Combination
    Uchida, Seiichi
    Amamoto, Kazuma
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 3011 - 3014
  • [45] Classifier combination through clustering in the output spaces
    Altinçay, H
    Çizili, B
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, PROCEEDINGS, 2003, 2756 : 487 - 493
  • [46] Entropy based classifier combination for sentence segmentation
    Magimai-Doss, M.
    Hakkani-Tuer, D.
    Cetin, O.
    Shriberg, E.
    Fung, J.
    Mirghafori, N.
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 189 - +
  • [47] Experimental study for the comparison of classifier combination methods
    Sohn, S. Y.
    Shin, H. W.
    PATTERN RECOGNITION, 2007, 40 (01) : 33 - 40
  • [48] Clustering-and-selection model for classifier combination
    Kuncheva, LI
    KES'2000: FOURTH INTERNATIONAL CONFERENCE ON KNOWLEDGE-BASED INTELLIGENT ENGINEERING SYSTEMS & ALLIED TECHNOLOGIES, VOLS 1 AND 2, PROCEEDINGS, 2000, : 185 - 188
  • [49] The Dempster-Shafer combination rule as a tool to classifier combination
    Ahmadzadeh, MR
    Petrou, M
    Sasikala, KR
    IGARSS 2000: IEEE 2000 INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOL I - VI, PROCEEDINGS, 2000, : 2429 - 2431
  • [50] Cluster-Oriented Ensemble Classifier: Impact of Multicluster Characterization on Ensemble Classifier Learning
    Verma, Brijesh
    Rahman, Ashfaqur
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (04) : 605 - 618