Review Sharing via Deep Semi-Supervised Code Clone Detection

被引:9
|
作者
Guo, Chenkai [1 ]
Yang, Hui [1 ]
Huang, Dengrong [1 ]
Zhang, Jianwen [3 ]
Dong, Naipeng [2 ]
Xu, Jing [3 ]
Zhu, Jingwen [4 ]
机构
[1] Nankai Univ, Coll Comp Sci, Tianjin 300350, Peoples R China
[2] Natl Univ Singapore, Sch Comp, Singapore, Singapore
[3] Nankai Univ, Coll Artificial Intelligence, Tianjin 300350, Peoples R China
[4] Nankai Univ, Coll Software, Tianjin 300350, Peoples R China
关键词
Code clone; software review; deep learning; semi-supervised CNN; review sharing; NETWORK; GRAPH;
D O I
10.1109/ACCESS.2020.2966532
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Code review as a typical type of user feedback has recently drawn increasing attentions for improving code quality. To carry out research on code review, sufficient review data is normally required. As a result, recent efforts commonly focus on analysis for projects with sufficient reviews (called "sprojects''), rather than projects with extremely few ones (called "f-projects''). Actually, through statistics on public platforms, the latter ones dominate open source software, in which novel approaches should be explored to improve their review-based code improvement. In this paper, we try to address the problem via building a review sharing channel where the informative review can be reasonably delivered from s-projects to the f-projects. To ensure the accuracy of shared reviews, we introduce a novel code clone detection model based on Convolutional Neural Network (CNN), and build suitable "s-projects, f-projects'' pairs through the clone detection. Especially, to alleviate the dataset heterogeneity between the training and testing, an autoencoder-based semi-supervised learning strategy is employed. Furthermore, to improve the sharing experience, heuristic filtering tactics are applied to reduce the time cost. Meanwhile, the LDA (Latent Dirichlet Allocation)-based ranking algorithm is used for presenting diverse review themes.We have implemented the sharing channel as a prototype system RSharer+, which contains three representative modules: data preprocessing, code clone detection and review presentation. The collected datasets are first transformed into context-sensitive numerical vectors in the data proprecessing. Then in the clone detection, data vectors are trained and tested on the BigCloneBench and real code-review pairs. At last, the presentation module provides review classification and theme extraction for better sharing experience. Extensive comparative experiments on hundreds of real labelled code fragments demonstrate the precision of clone detection and the effectiveness of review sharing.
引用
收藏
页码:24948 / 24965
页数:18
相关论文
共 50 条
  • [21] Deep semi-supervised learning for medical image segmentation: A review
    Han, Kai
    Sheng, Victor S.
    Song, Yuqing
    Liu, Yi
    Qiu, Chengjian
    Ma, Siqi
    Liu, Zhe
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
  • [22] Intent Detection and Discovery from User Logs via Deep Semi-Supervised Contrastive Clustering
    Kumar, Rajat
    Patidar, Mayur
    Varshney, Vaibhav
    Vig, Lovekesh
    Shroff, Gautam
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1836 - 1853
  • [23] Semi-Supervised Outlier Detection via Bipartite Graph Clustering
    El-Kilany, Ayman
    El Tazi, Neamat
    Ezzat, Ehab
    2016 IEEE/ACS 13TH INTERNATIONAL CONFERENCE OF COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2016,
  • [24] A review on semi-supervised clustering
    Cai, Jianghui
    Hao, Jing
    Yang, Haifeng
    Zhao, Xujun
    Yang, Yuqing
    INFORMATION SCIENCES, 2023, 632 : 164 - 200
  • [25] SAR Target Detection Network via Semi-supervised Learning
    Du Lan
    Wei Di
    Li Lu
    Guo Yuchen
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2020, 42 (01) : 154 - 163
  • [26] Spoof Face Detection Via Semi-Supervised Adversarial Training
    Chen, Chengwei
    Jing, Yaping
    Lu, Xuequan
    Yuan, Wang
    Ma, Lizhuang
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [27] GANomaly: Semi-supervised Anomaly Detection via Adversarial Training
    Akcay, Samet
    Atapour-Abarghouei, Amir
    Breckon, Toby P.
    COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 : 622 - 637
  • [28] A Social Spam Detection Framework via Semi-supervised Learning
    Zhang, Xianchao
    Bai, Haijun
    Liang, Wenxin
    TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING (PAKDD 2016), 2016, 9794 : 214 - 226
  • [29] Semi-supervised pedestrian and face detection via multiple teachers
    Gu, Yu
    Lu, Tao
    Fang, Wenhua
    Zhang, Yanduo
    ELECTRONICS LETTERS, 2022, 58 (22) : 825 - 827
  • [30] FMixCutMatch for semi-supervised deep learning
    Wei, Xiang
    Wei, Xiaotao
    Kong, Xiangyuan
    Lu, Siyang
    Xing, Weiwei
    Lu, Wei
    Neural Networks, 2021, 133 : 166 - 176