Efficient transformer with code token learner for code clone detection

被引:11
|
作者
Zhang, Aiping [1 ]
Fang, Liming [1 ,2 ]
Ge, Chunpeng [1 ]
Li, Piji [1 ]
Liu, Zhe [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Nanjing, Jiangsu, Peoples R China
[2] Nanjing Univ Aeronaut & Astronaut, Shenzhen Res Inst, Shenzhen, Guangdong, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Code clone detection; Code token learner; Efficient transformer;
D O I
10.1016/j.jss.2022.111557
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Deep learning techniques have achieved promising results in code clone detection in the past decade. Unfortunately, current deep learning-based methods rarely explicitly consider the modeling of long codes. Worse, the code length is increasing due to the increasing requirement of complex functions. Thus, modeling the relationship between code tokens to catch their long-range dependencies is crucial to comprehensively capture the information of the code fragment. In this work, we resort to the Transformer to capture long-range dependencies within a code, which however requires huge computational cost for long code fragments. To make it possible to apply Transformer efficiently, we propose a code token learner to largely reduce the number of feature tokens in an automatic way. Besides, considering the tree structure of the abstract syntax tree, we present a tree-based position embedding to encode the position of each token in the input. Apart from the Transformer that captures the dependency within a code, we further leverage a cross-code attention module to capture the similarities between two code fragments. Our method significantly reduces the computational cost of using Transformer by 97% while achieves superior performance with state-of-the-art methods. Our code is available at https://github.com/ArcticHare105/Code-Token-Learner.(c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Obfuscated code is identifiable by a token-based code clone detection technique
    Akram, Junaid
    Vasan, Danish
    Luo, Ping
    INTERNATIONAL JOURNAL OF INFORMATION AND COMPUTER SECURITY, 2022, 19 (3-4) : 254 - 273
  • [2] CCStokener: Fast yet accurate code clone detection with semantic token
    Wang, Wenjie
    Deng, Zihan
    Xue, Yinxing
    Xu, Yun
    JOURNAL OF SYSTEMS AND SOFTWARE, 2023, 199
  • [3] Multi-threshold token-based code clone detection
    Golubev, Yaroslav
    Poletansky, Viktor
    Povarov, Nikita
    Bryksin, Timofey
    2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER 2021), 2021, : 496 - 500
  • [4] CCFinder: A multilinguistic token-based code clone detection system for large scale source code
    Kamiya, T
    Kusumoto, S
    Inoue, K
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (07) : 654 - 670
  • [5] Boreas: An Accurate and Scalable Token-Based Approach to Code Clone Detection
    Yuan, Yang
    Guo, Yao
    2012 PROCEEDINGS OF THE 27TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE), 2012, : 286 - 289
  • [6] Graph-based code semantics learning for efficient semantic code clone detection
    Yu, Dongjin
    Yang, Quanxin
    Chen, Xin
    Chen, Jie
    Xu, Yihang
    INFORMATION AND SOFTWARE TECHNOLOGY, 2023, 156
  • [7] An enhanced transformer-based framework for interpretable code clone detection
    Nashaat, Mona
    Amin, Reem
    Eid, Ahmad Hosny
    Abdel-Kader, Rabab F.
    JOURNAL OF SYSTEMS AND SOFTWARE, 2025, 222
  • [8] Deep Learning Code Fragments for Code Clone Detection
    White, Martin
    Tufano, Michele
    Vendome, Christopher
    Poshyvanyk, Denys
    2016 31ST IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE), 2016, : 87 - 98
  • [9] Token-based Code Clone Detection Technique in a Student's Programming Exercise
    Iwamoto, Mai
    Oshima, Shunsuke
    Nakashima, Takuo
    2012 SEVENTH INTERNATIONAL CONFERENCE ON BROADBAND, WIRELESS COMPUTING, COMMUNICATION AND APPLICATIONS (BWCCA 2012), 2012, : 650 - 655
  • [10] Refactoring Code Clone Detection
    Othman, Zhala Sarkawt
    Kaya, Mehmet
    2019 7TH INTERNATIONAL SYMPOSIUM ON DIGITAL FORENSICS AND SECURITY (ISDFS), 2019,