CCStokener: Fast yet accurate code clone detection with semantic token

被引:6
|
作者
Wang, Wenjie [1 ,2 ]
Deng, Zihan [1 ,2 ]
Xue, Yinxing [1 ]
Xu, Yun [1 ,2 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230027, Peoples R China
[2] Key Lab High Performance Comp Anhui Prov, Hefei 230027, Peoples R China
基金
美国国家科学基金会;
关键词
Code clone detection; Semantic token; Near-miss clones; Scalable detection;
D O I
10.1016/j.jss.2023.111618
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code clone detection refers to the discovery of identical or similar code fragments in the code repository. AST-based, PDG-based, and DL-based tools can achieve good results on detecting near -miss clones (i.e., clones with small differences or gaps) by using syntax and semantic information, but they are difficult to apply to large code repositories due to high time complexity. Traditional token -based tools can rapidly detect clones by the low-cost index (i.e., low frequency or k-lines tokens) on sequential source code, but most of them have the poor capability on detecting near-miss clones because of the lack of semantic information.In this study, we propose a fast yet accurate code clone detection tool with the semantic token, called CCSTOKENER. The idea behind the semantic token is to enhance the detection capability of token-based tool via complementing the traditional token with semantic information such as the structural information around the token and its dependency with other tokens in form of n-gram. Specifically, we extract the type of relevant nodes in the AST path of every token and transform these types into a fixed-dimensional vector, then model its semantic information by applying n-gram on its related tokens. Meanwhile, our tool adopts and improves the location-filtration-verification process also used in CCALIGNER and LVMAPPER, during which process we build the low-cost k-tokens index to quickly locate the candidate code blocks and speed up detection efficiency. Our experiments show that CCSTOKENER achieves excellent accuracy on detecting more near-miss clone pairs, which exhibits the best recall on Moderately Type-3 clones and detects more true positive clones on four java open-source projects. Moreover, CCSTOKENER attains the best generalization and transferability compared with two DL-based tools (i.e., ASTNN, TBCCD).(c) 2023 Elsevier Inc. All rights reserved.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] BinSequence: Fast, Accurate and Scalable Binary Code Reuse Detection
    Huang, He
    Youssef, Amr M.
    Debbabi, Mourad
    PROCEEDINGS OF THE 2017 ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (ASIA CCS'17), 2017, : 155 - 166
  • [22] Refactoring Code Clone Detection
    Othman, Zhala Sarkawt
    Kaya, Mehmet
    2019 7TH INTERNATIONAL SYMPOSIUM ON DIGITAL FORENSICS AND SECURITY (ISDFS), 2019,
  • [23] Java bytecode clone detection via relaxation on code fingerprint and Semantic Web reasoning
    Keivanloo, Iman
    Roy, Chanchai K.
    Rilling, Juergen
    2012 6th International Workshop on Software Clones, IWSC 2012 - Proceedings, 2012, : 36 - 42
  • [24] Semantic Code Clone Detection Using Abstract Memory States And Program Dependency Graphs
    Nasirloo, Hamid
    Azimzadeh, Fatemeh
    2018 4TH INTERNATIONAL CONFERENCE ON WEB RESEARCH (ICWR), 2018, : 19 - 27
  • [25] Accurate code fragment clone detection and its application in identifying known CVE clones
    Arutunian, Mariam
    Sargsyan, Sevak
    Hovhannisyan, Hripsime
    Khroyan, Garnik
    Mkrtchyan, Albert
    Movsisyan, Hovhannes
    Avetisyan, Arutyun
    Aslanyan, Hayk
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2025, 24 (01)
  • [26] Adaptive Prefix Filtering for Accurate Code Clone Detection in Conjunction with Meta-learning
    Chavi Ralhan
    Navneet Malik
    Prateek Agrawal
    Charu Gupta
    Nishtha jatana
    Divya Jatain
    Geetanjali Sharma
    SN Computer Science, 5 (6)
  • [27] TRANSIMPEDANCE AMPS - FAST YET ACCURATE
    PALMER, W
    ELECTRONICS, 1988, 61 (01): : 151 - &
  • [28] Prioritizing Code Clone Detection Results for Clone Management
    Venkatasubramanyam, Radhika D.
    Gupta, Shrinath
    Singh, Himanshu Kumar
    2013 7TH INTERNATIONAL WORKSHOP ON SOFTWARE CLONES (IWSC), 2013, : 30 - 36
  • [29] Java']Java Code Clone Detection by Exploiting Semantic and Syntax Information From Intermediate Code-Based Graph
    Yuan, Dawei
    Fang, Sen
    Zhang, Tao
    Xu, Zhou
    Luo, Xiapu
    IEEE TRANSACTIONS ON RELIABILITY, 2023, 72 (02) : 511 - 526
  • [30] AdaCCD: Adaptive Semantic Contrasts Discovery Based Cross Lingual Adaptation for Code Clone Detection
    Du, Yangkai
    Ma, Tengfei
    Wu, Lingfei
    Zhang, Xuhong
    Ji, Shouling
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17942 - 17950