Cross-project Defect Prediction Method Using Adversarial Learning

被引:0
|
作者
Xing Y. [1 ]
Qian X.-M. [2 ]
Guan Y. [1 ]
Zhang S.-H. [1 ]
Zhao M.-C. [1 ]
Lin W.-T. [1 ]
机构
[1] School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing
[2] School of Modern Post (School of Automation), Beijing University of Posts and Telecommunications, Beijing
来源
Ruan Jian Xue Bao/Journal of Software | 2022年 / 33卷 / 06期
关键词
Abstract syntax tree (AST); Bag-of-words model; Cross-project defect prediction; Generative adversarial network (GAN);
D O I
10.13328/j.cnki.jos.006571
中图分类号
学科分类号
摘要
Cross-project defect prediction (CPDP) has become an important research direction in data mining of software engineering, which uses the defective codes of other projects to build prediction models and solves the problem of insufficient data in the process of model construction. Nevertheless, there is difference in data distribution between the code files of source and target projects, which leads to poor cross-project prediction results. Based on the adversarial learning idea of generative adversarial network (GAN), under the action of discriminator, the distribution of target project features can be changed to make it close to the distribution of source project features, so as to improve the performance of cross-project defect prediction. Specifically, the process of the proposed abstract continuous GAN (AC-GAN) method consists of two stages: Data processing and model construction. First, the source and target project codes are converted into the form of abstract syntax trees (ASTs), and then the ASTs are traversed in a depth-first manner to derive the token sequences. The continuous bag-of-words model (CBOW) is used to generate word vectors, and the token sequences are transformed into numeric vectors based on the word vector table. Second, the processed numeric vectors are fed into a GAN structure-based model for feature extraction and data migration. Finally, a binary classifier is used to determine whether the target project code files are defective or not. The AC-GAN method conducted comparison experiments on 15 sets of source-target project pairs, and the experimental results demonstrate the effectiveness of this method. © Copyright 2022, Institute of Software, the Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:2097 / 2112
页数:15
相关论文
共 44 条
  • [1] Gray J., Why do computers stop and what can be done about it?, Proc. of the Symp. on Reliability in Distributed Software & Database Systems, (1986)
  • [2] Hall T, Beecham S, Bowes D, Gray D, Counsell S., A systematic literature review on fault prediction performance in software engineering, IEEE Trans. on Software Engineering, 38, 6, pp. 1276-1304, (2011)
  • [3] Punitha K, Chitra S., Software defect prediction using software metrics-A survey, Proc. of the Int'l Conf. on Information Communication & Embedded Systems, pp. 555-558, (2013)
  • [4] Yang X, Lo D, Xia X, Yun Z, Sun J., Deep learning for just-in-time defect prediction, Proc. of the 2015 IEEE Int'l Conf. on Software Quality, Reliability and Security, pp. 17-26, (2015)
  • [5] Wang S, Liu T, Tan L., Automatically learning semantic features for defect prediction, Proc. of the 38th IEEE/ACM Int'l Conf. on Software Engineering (ICSE), pp. 297-308, (2016)
  • [6] Qiao L, Li G, Yu D, Liu H., Deep feature learning to quantitative prediction of software defects, Proc. of the 45th IEEE Annual Computers, Software, and Applications Conf. (COMPSAC), pp. 1401-1402, (2021)
  • [7] Jones J., Abstract syntax tree implementation idioms, (2003)
  • [8] Mikolov T, Chen K, Corrado G, Dean J., Efficient estimation of word representations in vector space, (2013)
  • [9] Gray D, Bowes D, Davey N, Yi S, Christianson B., Software defect prediction using static code metrics underestimates defect- proneness, Proc. of the 2010 Int'l Joint Conf. on Neural Networks (IJCNN), pp. 1-7, (2010)
  • [10] Hosseini S, Turhan B, Gunarathna D., A systematic literature review and meta-analysis on cross project defect prediction, IEEE Trans. on Software Engineering, 45, 2, pp. 111-147, (2017)