Automatic detection of Long Method and God Class code smells through neural source code embeddings

被引:29
|
作者
Kovacevic, Aleksandar [1 ]
Slivka, Jelena [1 ]
Vidakovic, Dragan [1 ]
Grujic, Katarina-Glorija [1 ]
Luburic, Nikola [1 ]
Prokic, Simona [1 ]
Sladic, Goran [1 ]
机构
[1] Univ Novi Sad, Fac Tech Sci, Trg Dositeja Obradovica 6, Novi Sad 21000, Serbia
关键词
Code smell detection; Neural source code embeddings; Code metrics; Machine learning; Software engineering; IMPACT;
D O I
10.1016/j.eswa.2022.117607
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Code smells are structures in code that often harm its quality. Manually detecting code smells is challenging, so researchers proposed many automatic detectors. Traditional code smell detectors employ metric-based heuristics, but researchers have recently adopted a Machine-Learning (ML) based approach. This paper compares the performance of multiple ML-based code smell detection models against multiple metric-based heuristics for detection of God Class and Long Method code smells. We assess the effectiveness of different source code representations for ML: we evaluate the effectiveness of traditionally used code metrics against code embeddings (code2vec, code2seq, and CuBERT). This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. This approach helped us leverage the power of transfer learning - our study is the first to explore whether the knowledge mined from code understanding models can be transferred to code smell detection. A secondary contribution of our research is the systematic evaluation of the effectiveness of code smell detection approaches on the same large-scale, manually labeled MLCQ dataset. Almost every study that proposes a detection approach tests this approach on the dataset unique for the study. Consequently, we cannot directly compare the reported performances to derive the bestperforming approach.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Method and its system of Java source and byte code plagiarism detection
    Li, Hu
    Liu, Chao
    Liu, Nan
    Li, Xiaoli
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2010, 36 (04): : 424 - 428
  • [42] Source templates for the automatic generation of adjoint code through static call graph reversal
    Naumann, U
    Utke, J
    COMPUTATIONAL SCIENCE - ICCS 2005, PT 1, PROCEEDINGS, 2005, 3514 : 338 - 346
  • [43] CLASC: A Changelog Based Automatic Code Source Classification Method for Operating System Packages
    Ren, Yi
    Guan, Jianbo
    Ma, Jun
    Tan, Yusong
    Wu, Qingbo
    Ding, Yan
    2019 26TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC), 2019, : 378 - 385
  • [44] A TWO-STEP IN-CLASS SOURCE CODE PLAGIARISM DETECTION METHOD UTILIZING IMPROVED CM ALGORITHM AND SIM
    Ohno, Asako
    Murao, Hajime
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2011, 7 (08): : 4729 - 4739
  • [45] Cross-language Source Code Clone Detection Based On Graph Neural Network
    Zhang, Yuguo
    Yang, Jia
    Ruan, Ou
    PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CRYPTOGRAPHY, NETWORK SECURITY AND COMMUNICATION TECHNOLOGY, CNSCT 2024, 2024, : 189 - 194
  • [46] VDHGT: A Source Code Vulnerability Detection Method Based on Heterogeneous Graph Transformer
    Yang, Hongyu
    Yang, Haiyun
    Zhang, Liang
    CYBERSPACE SAFETY AND SECURITY, CSS 2022, 2022, 13547 : 217 - 224
  • [47] Interpretation of Learning-Based Automatic Source Code Vulnerability Detection Model Using LIME
    Tang, Gaigai
    Zhang, Long
    Yang, Feng
    Meng, Lianxiao
    Cao, Weipeng
    Qiu, Meikang
    Ren, Shuangyin
    Yang, Lin
    Wang, Huiqiang
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, 2021, 12817 : 275 - 286
  • [48] A novel vulnerability severity assessment method for source code based on a graph neural network
    Hao, Jingwei
    Luo, Senlin
    Pan, Limin
    INFORMATION AND SOFTWARE TECHNOLOGY, 2023, 161
  • [49] Empirical evaluation of code smells in open-source software (OSS) using Best Worst Method (BWM) and TOPSIS approach
    Tandon, Stuti
    Kumar, Vijay
    Singh, V. B.
    INTERNATIONAL JOURNAL OF QUALITY & RELIABILITY MANAGEMENT, 2022, 39 (03) : 815 - 835
  • [50] An Efficient Programming Rule Extraction and Detection of Violations in Software Source Code using Neural networks
    Pravin, A.
    Srinivasan, S.
    2012 FOURTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2012,