Semantic Code Clone Detection Based on Community Detection

被引:0
|
作者
Wan, Zexuan [1 ]
Xie, Chunli [1 ]
Lv, Quanrun [1 ]
Fan, Yasheng [1 ]
机构
[1] Technol Jiangsu Normal Univ, Sch Comp Sci, Xuzhou 221116, Peoples R China
基金
中国国家自然科学基金;
关键词
Code clone detection; semantic clone; community detection; centrality analysis; siamese network; CENTRALITY;
D O I
10.1142/S0218194024500323
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic code clone detection is to find code snippets that are structurally or syntactically different, but semantically identical. It plays an important role in software reuse, code compression. Many existing studies have achieved good performance in non-semantic clone, but semantic clone is still a challenging task. Recently, several works have used tree or graph, such as Abstract Syntax Tree (AST), Control Flow Graph (CFG) or Program Dependency Graph (PDG) to extract semantic information from source codes. In order to reduce the complexity of tree and graph, some studies transform them into node sequences. However, this transformation will lose some semantic information. To address this issue, we propose a novel high-performance method that utilizes community detection to extract features of AST while preserving its semantic information. First, based on the AST of source code, we exploit community detection to split AST into different subtrees to extract the underlying semantics information of different code blocks, and use centrality analysis to quantify the semantic information as the weight of AST nodes. Then, the AST is converted into a sequence of tokens with weights, and a Siamese neural network model is used to detect the similarity of token sequences for semantic code clone detection. Finally, to evaluate our approach, we conduct experiments on two standard benchmark datasets, Google Code Jam (GCJ) and BigCloneBench (BCB). Experimental results show that our model outperforms the eight publicly available state-of-the-art methods in detecting code clones. It is five times faster than the tree-based method (ASTNN) in terms of time complexity.
引用
收藏
页码:1661 / 1692
页数:32
相关论文
共 50 条
  • [1] Case Study on Semantic Clone Detection Based On Code Behavior
    Priyambadha, Bayu
    Rochimah, Siti
    2014 International Conference on Data and Software Engineering (ICODSE), 2014,
  • [2] Semantic Clone Detection Based on Code Feature Fusion Learning
    Zhang, Qianjin
    Jin, Dahai
    Wang, Yawen
    Gong, Yunzhan
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2023, 33 (07) : 1039 - 1062
  • [3] Interpreting CodeBERT for Semantic Code Clone Detection
    Abid, Shamsa
    Cai, Xuemeng
    Jiang, Lingxiao
    PROCEEDINGS OF THE 2023 30TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, APSEC 2023, 2023, : 229 - 238
  • [4] Semantic Code Clone Detection for Enterprise Applications
    Svacina, Jan
    Simmons, Jonathan
    Cerny, Tomas
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 129 - 131
  • [5] Graph-based code semantics learning for efficient semantic code clone detection
    Yu, Dongjin
    Yang, Quanxin
    Chen, Xin
    Chen, Jie
    Xu, Yihang
    INFORMATION AND SOFTWARE TECHNOLOGY, 2023, 156
  • [6] Semantic Code Clone Detection Method for Distributed Enterprise Systems
    Svacina, Jan
    Bushong, Vincent
    Das, Dipta
    Cerny, Tomas
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE (CLOSER), 2022, : 27 - 37
  • [7] Semantic Clone Detection: Can Source Code Comments Help?
    Ghosh, Akash
    Kuttal, Sandeep Kaur
    2018 IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING (VL/HCC), 2018, : 315 - 317
  • [8] Code Clone Detection Based on Contrastive Learning
    Xie, Chunli
    Liang, Yao
    Lv, Quanrun
    Wan, Zexuan
    2024 IEEE 4TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND ARTIFICIAL INTELLIGENCE, SEAI 2024, 2024, : 151 - 156
  • [9] Graph-of-Code: Semantic Clone Detection Using Graph Fingerprints
    Alhazami, Essa A.
    Sheneamer, Abdullah M.
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (08) : 3972 - 3988
  • [10] TreeCen: Building Tree Graph for Scalable Semantic Code Clone Detection
    Hu, Yutao
    Zou, Deqing
    Peng, Junru
    Wu, Yueming
    Shan, Junjie
    Jin, Hai
    PROCEEDINGS OF THE 37TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE 2022, 2022,