Efficient transformer with code token learner for code clone detection

被引:11
|
作者
Zhang, Aiping [1 ]
Fang, Liming [1 ,2 ]
Ge, Chunpeng [1 ]
Li, Piji [1 ]
Liu, Zhe [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Nanjing, Jiangsu, Peoples R China
[2] Nanjing Univ Aeronaut & Astronaut, Shenzhen Res Inst, Shenzhen, Guangdong, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Code clone detection; Code token learner; Efficient transformer;
D O I
10.1016/j.jss.2022.111557
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Deep learning techniques have achieved promising results in code clone detection in the past decade. Unfortunately, current deep learning-based methods rarely explicitly consider the modeling of long codes. Worse, the code length is increasing due to the increasing requirement of complex functions. Thus, modeling the relationship between code tokens to catch their long-range dependencies is crucial to comprehensively capture the information of the code fragment. In this work, we resort to the Transformer to capture long-range dependencies within a code, which however requires huge computational cost for long code fragments. To make it possible to apply Transformer efficiently, we propose a code token learner to largely reduce the number of feature tokens in an automatic way. Besides, considering the tree structure of the abstract syntax tree, we present a tree-based position embedding to encode the position of each token in the input. Apart from the Transformer that captures the dependency within a code, we further leverage a cross-code attention module to capture the similarities between two code fragments. Our method significantly reduces the computational cost of using Transformer by 97% while achieves superior performance with state-of-the-art methods. Our code is available at https://github.com/ArcticHare105/Code-Token-Learner.(c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] CCCD: Concolic Code Clone Detection
    Krutz, Daniel E.
    Shihab, Emad
    2013 20TH WORKING CONFERENCE ON REVERSE ENGINEERING (WCRE), 2013, : 489 - 490
  • [22] Challenges in Behavioral Code Clone Detection
    Su, Fang-Hsiang
    Bell, Jonathan
    Kaiser, Gail
    2016 IEEE 23RD INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER), VOL 3, 2016, : 21 - 22
  • [23] Code Clone Detection using Wavelets
    Karus, Siim
    Kilgi, Karl
    2015 IEEE 9TH INTERNATIONAL WORKSHOP ON SOFTWARE CLONES (IWSC), 2015, : 8 - 14
  • [24] On Precision of Code Clone Detection Tools
    Farmahinifarahani, Farima
    Saini, Vaibhav
    Yang, Di
    Sajnani, Hitesh
    Lopes, Cristina V.
    2019 IEEE 26TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING (SANER), 2019, : 84 - 94
  • [25] On the Robustness of Clone Detection to Code Obfuscation
    Schulze, Sandro
    Meyer, Daniel
    2013 7TH INTERNATIONAL WORKSHOP ON SOFTWARE CLONES (IWSC), 2013, : 62 - 68
  • [26] Indexing source code and clone detection
    Tronicek, Zdenek
    INFORMATION AND SOFTWARE TECHNOLOGY, 2022, 144
  • [27] Code Clone Detection: A Literature Review
    Chen Q.-Y.
    Li S.-P.
    Yan M.
    Xia X.
    Ruan Jian Xue Bao/Journal of Software, 2019, 30 (04): : 962 - 980
  • [28] Interface Driven Code Clone Detection
    Misu, Md Rakib Hossain
    Sakib, Kazi
    2017 24TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2017), 2017, : 747 - 748
  • [29] A Novel Code Stylometry-based Code Clone Detection Strategy
    Dong, Wenyuan
    Feng, Zhiyong
    Wei, Hua
    Luo, Hong
    2020 16TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC, 2020, : 1516 - 1521
  • [30] Generic Code Cloning method for Detection of Clone Code in Software Development
    Haque, Syed Mohd Fazalul
    Srikanth, V.
    Reddy, E. Sreenivasa
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON DATA MINING AND ADVANCED COMPUTING (SAPIENCE), 2016, : 340 - 344