FSD-CLCD: Functional semantic distillation graph learning for cross-language code clone detection

被引:1
|
作者
Zhang, Linghao [1 ]
Luo, Senlin [1 ]
Pan, Limin [1 ]
Wu, Zhouting [1 ]
Gong, Kun [1 ]
机构
[1] Beijing Inst Technol, Informat Syst & Secur & Countermeasures Expt Ctr, Beijing 100081, Peoples R China
关键词
Code clone detection; Cross; -language; Graph similarity learning; Contrastive learning;
D O I
10.1016/j.engappai.2024.108199
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Code clone detection can find similar or the same code snippets, which is important in analyzing homologous components, discovering redundant code, and improving software system development and maintenance efficiency. A crucial challenge is to extract more functional semantic similarity from code in heterogeneous conditions, such as a cross-language scenario. Existing methods mainly exploit sequence models with only lexical and statistical features to compare code pairs, which are susceptible to linguistic feature noise and misclassify code pairs that have similar structure dependencies such as control flow. Meanwhile, there are issues with inconsistent node types and a great variation of node numbers while capturing structure-dependent features, resulting in a misaligned distribution of clone pairs, and weakening the detection precision. This work presents a novel cross-language code clone detection method. It represents code with a graph structure based on abstract syntax trees and introduces a global node to strengthen the connection between control flows. Prune the graph structure based on key node protection rules to reduce the impact of linguistic feature noise. Besides, optimize graph matching networks for cross-language abstract syntax trees by using contrastive loss to align the functional semantic distribution of clone pairs. The method distills the invariant functional semantic similarity with a huge discrepancy of the code graph in heterogeneous cross-language conditions. Experiment results show that the proposed method achieves scores of 0.95, 0.98, and 0.96 in terms of precision, recall and F1-score and substantially outperforms the state-of-the-art baselines.
引用
收藏
页数:15
相关论文
共 38 条
  • [1] CLCD-I: Cross-Language Clone Detection by Using Deep Learning with InferCode
    Yahya, Mohammad A. A.
    Kim, Dae-Kyoo
    COMPUTERS, 2023, 12 (01)
  • [2] Improving Cross-Language Code Clone Detection via Code Representation Learning and Graph Neural Networks
    Mehrotra, Nikita
    Sharma, Akash
    Jindal, Anmol
    Purandare, Rahul
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (11) : 4846 - 4868
  • [3] Cross-language Source Code Clone Detection Based On Graph Neural Network
    Zhang, Yuguo
    Yang, Jia
    Ruan, Ou
    PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CRYPTOGRAPHY, NETWORK SECURITY AND COMMUNICATION TECHNOLOGY, CNSCT 2024, 2024, : 189 - 194
  • [4] Cross-Language Code Similarity and Applications in Clone Detection and Code Search
    Mathew, George Varghese
    ProQuest Dissertations and Theses Global, 2022,
  • [5] C4: Contrastive Cross-Language Code Clone Detection
    Tao, Chenning
    Zhan, Qi
    Hu, Xing
    Xia, Xin
    30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022), 2022, : 413 - 424
  • [6] TCCCD: Triplet-Based Cross-Language Code Clone Detection
    Fang, Yong
    Zhou, Fangzheng
    Xu, Yijia
    Liu, Zhonglin
    APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [7] Cross-language clone detection by learning over abstract syntax trees
    Perez, Daniel
    Chiba, Shigeru
    IEEE International Working Conference on Mining Software Repositories, 2019, 2019-May : 518 - 528
  • [8] Graph-based code semantics learning for efficient semantic code clone detection
    Yu, Dongjin
    Yang, Quanxin
    Chen, Xin
    Chen, Jie
    Xu, Yihang
    INFORMATION AND SOFTWARE TECHNOLOGY, 2023, 156
  • [9] Structural and Nominal Cross-Language Clone Detection
    Nichols, Lawton
    Emre, Mehmet
    Hardekopf, Ben
    FUNDAMENTAL APPROACHES TO SOFTWARE ENGINEERING (FASE 2019), 2019, 11424 : 247 - 263
  • [10] LICCA: A Tool for Cross-Language Clone Detection
    Vislayski, Tijana
    Rakic, Gordana
    Cardozo, Nicolas
    Budimac, Zoran
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2018), 2018, : 512 - 516