FSD-CLCD: Functional semantic distillation graph learning for cross-language code clone detection

被引:1
|
作者
Zhang, Linghao [1 ]
Luo, Senlin [1 ]
Pan, Limin [1 ]
Wu, Zhouting [1 ]
Gong, Kun [1 ]
机构
[1] Beijing Inst Technol, Informat Syst & Secur & Countermeasures Expt Ctr, Beijing 100081, Peoples R China
关键词
Code clone detection; Cross; -language; Graph similarity learning; Contrastive learning;
D O I
10.1016/j.engappai.2024.108199
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Code clone detection can find similar or the same code snippets, which is important in analyzing homologous components, discovering redundant code, and improving software system development and maintenance efficiency. A crucial challenge is to extract more functional semantic similarity from code in heterogeneous conditions, such as a cross-language scenario. Existing methods mainly exploit sequence models with only lexical and statistical features to compare code pairs, which are susceptible to linguistic feature noise and misclassify code pairs that have similar structure dependencies such as control flow. Meanwhile, there are issues with inconsistent node types and a great variation of node numbers while capturing structure-dependent features, resulting in a misaligned distribution of clone pairs, and weakening the detection precision. This work presents a novel cross-language code clone detection method. It represents code with a graph structure based on abstract syntax trees and introduces a global node to strengthen the connection between control flows. Prune the graph structure based on key node protection rules to reduce the impact of linguistic feature noise. Besides, optimize graph matching networks for cross-language abstract syntax trees by using contrastive loss to align the functional semantic distribution of clone pairs. The method distills the invariant functional semantic similarity with a huge discrepancy of the code graph in heterogeneous cross-language conditions. Experiment results show that the proposed method achieves scores of 0.95, 0.98, and 0.96 in terms of precision, recall and F1-score and substantially outperforms the state-of-the-art baselines.
引用
收藏
页数:15
相关论文
共 38 条
  • [31] TF-IDF-INSPIRED DETECTION FOR CROSS-LANGUAGE SOURCE CODE PLAGIARISM AND COLLUSION
    Karnalim, Oscar
    COMPUTER SCIENCE-AGH, 2020, 21 (01): : 113 - 136
  • [32] Graph-Based Similarity Analysis: A New Approach to Cross-Language Plagiarism Detection
    Franco-Salvador, Marc
    Gupta, Parth
    Rosso, Paolo
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2013, (50): : 21 - 28
  • [33] CLCDSA: Cross Language Code Clone Detection using Syntactical Features and API Documentation
    Nafi, Kawser Wazed
    Kar, Tonny Shekha
    Roy, Banani
    Roy, Chanchal K.
    Schneider, Kevin A.
    34TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2019), 2019, : 1026 - 1037
  • [34] AdaCCD: Adaptive Semantic Contrasts Discovery Based Cross Lingual Adaptation for Code Clone Detection
    Du, Yangkai
    Ma, Tengfei
    Wu, Lingfei
    Zhang, Xuhong
    Ji, Shouling
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17942 - 17950
  • [35] Cross-language plagiarism detection over continuous-space- and knowledge graph-based representations of language
    Franco-Salvador, Marc
    Gupta, Parth
    Rosso, Paolo
    Banchs, Rafael E.
    KNOWLEDGE-BASED SYSTEMS, 2016, 111 : 87 - 99
  • [36] Cross-language Speech Attribute Detection and Phone Recognition for Tibetan Using Deep Learning
    Wang, Hui
    Zhao, Yue
    Xu, Yanmin
    Xu, Xiaona
    Suo, Xingmei
    Ji, Qiang
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 474 - +
  • [37] Learning Graph-based Code Representations for Source-level Functional Similarity Detection
    Liu, Jiahao
    Zeng, Jun
    Wang, Xiang
    Liang, Zhenkai
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 345 - 357
  • [38] SimCGE: Simple Contrastive Learning of Graph Embeddings for Cross-Version Binary Code Similarity Detection
    Xia, Fengliang
    Wu, Guixing
    Zhao, Guochao
    Li, Xiangyu
    INFORMATION AND COMMUNICATIONS SECURITY, ICICS 2022, 2022, 13407 : 458 - 471