FSD-CLCD: Functional semantic distillation graph learning for cross-language code clone detection

被引:1
|
作者
Zhang, Linghao [1 ]
Luo, Senlin [1 ]
Pan, Limin [1 ]
Wu, Zhouting [1 ]
Gong, Kun [1 ]
机构
[1] Beijing Inst Technol, Informat Syst & Secur & Countermeasures Expt Ctr, Beijing 100081, Peoples R China
关键词
Code clone detection; Cross; -language; Graph similarity learning; Contrastive learning;
D O I
10.1016/j.engappai.2024.108199
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Code clone detection can find similar or the same code snippets, which is important in analyzing homologous components, discovering redundant code, and improving software system development and maintenance efficiency. A crucial challenge is to extract more functional semantic similarity from code in heterogeneous conditions, such as a cross-language scenario. Existing methods mainly exploit sequence models with only lexical and statistical features to compare code pairs, which are susceptible to linguistic feature noise and misclassify code pairs that have similar structure dependencies such as control flow. Meanwhile, there are issues with inconsistent node types and a great variation of node numbers while capturing structure-dependent features, resulting in a misaligned distribution of clone pairs, and weakening the detection precision. This work presents a novel cross-language code clone detection method. It represents code with a graph structure based on abstract syntax trees and introduces a global node to strengthen the connection between control flows. Prune the graph structure based on key node protection rules to reduce the impact of linguistic feature noise. Besides, optimize graph matching networks for cross-language abstract syntax trees by using contrastive loss to align the functional semantic distribution of clone pairs. The method distills the invariant functional semantic similarity with a huge discrepancy of the code graph in heterogeneous cross-language conditions. Experiment results show that the proposed method achieves scores of 0.95, 0.98, and 0.96 in terms of precision, recall and F1-score and substantially outperforms the state-of-the-art baselines.
引用
收藏
页数:15
相关论文
共 38 条
  • [21] Pathways to Leverage Transcompiler based Data Augmentation for Cross-Language Clone Detection
    Pinku, Subroto Nag
    Mondal, Debajyoti
    Roy, Chanchal K.
    2023 IEEE/ACM 31ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2023, : 169 - 180
  • [22] Flowchart-Based Cross-Language Source Code Similarity Detection
    Zhang, Feng
    Li, Guofan
    Liu, Cong
    Song, Qian
    SCIENTIFIC PROGRAMMING, 2020, 2020
  • [23] Cross-language Code Coupling Detection: A Preliminary Study on Android Applications
    Shen, Bo
    Zhang, Wei
    Yu, Ailun
    Wei, Zhao
    Liang, Guangtai
    Zhao, Haiyan
    Jin, Zhi
    2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2021), 2021, : 378 - 389
  • [24] Translate-Distill: Learning Cross-Language Dense Retrieval by Translation and Distillation
    Yang, Eugene
    Lawrie, Dawn
    Mayfield, James
    Oard, Douglas W.
    Miller, Scott
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT II, 2024, 14609 : 50 - 65
  • [25] A systematic study of knowledge graph analysis for cross-language plagiarism detection
    Franco-Salvador, Marc
    Rosso, Paolo
    Montes-y-Gomez, Manuel
    INFORMATION PROCESSING & MANAGEMENT, 2016, 52 (04) : 550 - 570
  • [26] Hierarchical Learning of Cross-Language Mappings through Distributed Vector Representations for Code
    Bui, Nghi D. Q.
    Jiang, Lingxiao
    2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: NEW IDEAS AND EMERGING TECHNOLOGIES RESULTS (ICSE-NIER), 2018, : 33 - 36
  • [27] Cross-language Citation Recommendation via Hierarchical Representation Learning on Heterogeneous Graph
    Jiang, Zhuoren
    Yin, Yue
    Gao, Liangcai
    Lu, Yao
    Liu, Xiaozhong
    ACM/SIGIR PROCEEDINGS 2018, 2018, : 635 - 644
  • [28] Building Bridges in Computer Networks: A Nifty Assignment for Cross-Language Learning and Code Refactoring
    Akhmetov, Ildar
    Schmidt, Logan W.
    PROCEEDINGS OF THE 26TH WESTERN CANADIAN CONFERENCE ON COMPUTING EDUCATION, WCCCE 2024, 2024,
  • [29] Detection of Software Security Weaknesses Using Cross-Language Source Code Representation (CLaSCoRe)
    Zaharia, Sergiu
    Rebedea, Traian
    Trausan-Matu, Stefan
    APPLIED SCIENCES-BASEL, 2023, 13 (13):
  • [30] Java']Java Code Clone Detection by Exploiting Semantic and Syntax Information From Intermediate Code-Based Graph
    Yuan, Dawei
    Fang, Sen
    Zhang, Tao
    Xu, Zhou
    Luo, Xiapu
    IEEE TRANSACTIONS ON RELIABILITY, 2023, 72 (02) : 511 - 526