Review Sharing via Deep Semi-Supervised Code Clone Detection

被引:9
|
作者
Guo, Chenkai [1 ]
Yang, Hui [1 ]
Huang, Dengrong [1 ]
Zhang, Jianwen [3 ]
Dong, Naipeng [2 ]
Xu, Jing [3 ]
Zhu, Jingwen [4 ]
机构
[1] Nankai Univ, Coll Comp Sci, Tianjin 300350, Peoples R China
[2] Natl Univ Singapore, Sch Comp, Singapore, Singapore
[3] Nankai Univ, Coll Artificial Intelligence, Tianjin 300350, Peoples R China
[4] Nankai Univ, Coll Software, Tianjin 300350, Peoples R China
关键词
Code clone; software review; deep learning; semi-supervised CNN; review sharing; NETWORK; GRAPH;
D O I
10.1109/ACCESS.2020.2966532
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Code review as a typical type of user feedback has recently drawn increasing attentions for improving code quality. To carry out research on code review, sufficient review data is normally required. As a result, recent efforts commonly focus on analysis for projects with sufficient reviews (called "sprojects''), rather than projects with extremely few ones (called "f-projects''). Actually, through statistics on public platforms, the latter ones dominate open source software, in which novel approaches should be explored to improve their review-based code improvement. In this paper, we try to address the problem via building a review sharing channel where the informative review can be reasonably delivered from s-projects to the f-projects. To ensure the accuracy of shared reviews, we introduce a novel code clone detection model based on Convolutional Neural Network (CNN), and build suitable "s-projects, f-projects'' pairs through the clone detection. Especially, to alleviate the dataset heterogeneity between the training and testing, an autoencoder-based semi-supervised learning strategy is employed. Furthermore, to improve the sharing experience, heuristic filtering tactics are applied to reduce the time cost. Meanwhile, the LDA (Latent Dirichlet Allocation)-based ranking algorithm is used for presenting diverse review themes.We have implemented the sharing channel as a prototype system RSharer+, which contains three representative modules: data preprocessing, code clone detection and review presentation. The collected datasets are first transformed into context-sensitive numerical vectors in the data proprecessing. Then in the clone detection, data vectors are trained and tested on the BigCloneBench and real code-review pairs. At last, the presentation module provides review classification and theme extraction for better sharing experience. Extensive comparative experiments on hundreds of real labelled code fragments demonstrate the precision of clone detection and the effectiveness of review sharing.
引用
收藏
页码:24948 / 24965
页数:18
相关论文
共 50 条
  • [1] Deep learning via semi-supervised embedding
    Weston, Jason
    Ratle, Frédéric
    Mobahi, Hossein
    Collobert, Ronan
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012, 7700 LECTURE NO : 639 - 655
  • [2] Semi-supervised Deep Learning for Network Anomaly Detection
    Sun, Yuanyuan
    Guo, Lili
    Li, Ye
    Xu, Lele
    Wang, Yongming
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2019, PT II, 2020, 11945 : 383 - 390
  • [3] Semi-supervised Deep Domain Adaptation for Deepfake Detection
    Seraj, Md Shamim
    Singh, Ankita
    Chakraborty, Shayok
    2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, : 1061 - 1071
  • [4] Facial landmark detection by semi-supervised deep learning
    Tang, Xin
    Guo, Fang
    Shen, Jianbing
    Du, Tianyuan
    NEUROCOMPUTING, 2018, 297 : 22 - 32
  • [5] SEMI-SUPERVISED LANE DETECTION WITH DEEP HOUGH TRANSFORM
    Lin, Yancong
    Pintea, Silvia-Laura
    van Gernert, Jan
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1514 - 1518
  • [6] Semi-Supervised Anomaly Detection Via Neural Process
    Zhou, Fan
    Wang, Guanyu
    Zhang, Kunpeng
    Liu, Siyuan
    Zhong, Ting
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (10) : 10423 - 10435
  • [7] SEMI-SUPERVISED CHANGE DETECTION VIA GAUSSIAN PROCESSES
    Chen, Keming
    Huo, Chunlei
    Zhou, Zhixin
    Lu, Hanqing
    Cheng, Jian
    2009 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOLS 1-5, 2009, : 1247 - 1250
  • [8] Semi-supervised Object Detection via VC Learning
    Chen, Changrui
    Debattista, Kurt
    Han, Jungong
    COMPUTER VISION, ECCV 2022, PT XXXI, 2022, 13691 : 169 - 185
  • [9] Deep Semi-Supervised Learning
    Hailat, Zeyad
    Komarichev, Artem
    Chen, Xue-Wen
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 2154 - 2159
  • [10] Review of Semi-Supervised Method For Intrusion Detection System
    Fitriani, Sofy
    Mandala, Satria
    Murti, Muhammad Ary
    2016 ASIA PACIFIC CONFERENCE ON MULTIMEDIA AND BROADCASTING (APMEDIACAST), 2016, : 36 - 41