Improving Performance of Automatic Duplicate Bug Reports Detection using Longest Common Sequence Introducing New Textual Features for Textual Similarity Detection

被引:0
|
作者
Neysiani, Behzad Soleimani [1 ]
Babamir, Seyed Morteza [1 ]
机构
[1] Univ Kashan, Dept Software Engn, Fac Comp & Elect Engn, Kashan, Esfahan, Iran
关键词
Triage System; Bug Reports; Duplicate; Automatic; Detection; Text Mining; Natural Language Processing; Information Retrieval; Longest Common Sequence;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
automatic duplicate bug reports detection is a famous problem in mining software repositories since 2004 for software triage systems e.g. Bugzilla. Textual features are the most important type of features in similarity and duplicate detection e.g. BM25F which indicate the common term frequency in two reports. Sometimes a common sequence can show more similarity in two texts, thus new features based on longest common sequence (LCS) of two bug reports proposed in this paper as new textual features for text similarity detection. Android, Eclipse, Mozilla, and Open Office dataset are used for evaluation of proposed features and the experimental results show LCS-based features are important and the accuracy, precision and recall of classifier prediction models improved 4.5, 2.5 and 2.5 percent respectively on average after using LCS and get up to 96, 98 and 97 percent respectively on average using different classifiers.
引用
收藏
页码:378 / 383
页数:6
相关论文
共 3 条
  • [1] New Methodology for Contextual Features Usage in Duplicate Bug Reports Detection
    Neysiani, Behzad Soleimani
    Babamir, Seyed Morteza
    2019 5TH INTERNATIONAL CONFERENCE ON WEB RESEARCH (ICWR), 2019, : 178 - 183
  • [2] AUTOMATIC DETECTION OF CONTRASTIVE WORD PAIRS USING TEXTUAL AND ACOUSTIC FEATURES
    Zang, Xiao
    Wu, Zhiyong
    Ning, Yishuang
    Meng, Helen
    Cai, Lianhong
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 594 - 598
  • [3] Automatic patch linkage detection in code review using textual content and file location features
    Wang, Dong
    Kula, Raula Gaikovina
    Ishio, Takashi
    Matsumoto, Kenichi
    INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 139