Automatic detection of Long Method and God Class code smells through neural source code embeddings

被引：29

作者：

Kovacevic, Aleksandar ^{[1
]}

Slivka, Jelena ^{[1
]}

Vidakovic, Dragan ^{[1
]}

Grujic, Katarina-Glorija ^{[1
]}

Luburic, Nikola ^{[1
]}

Prokic, Simona ^{[1
]}

Sladic, Goran ^{[1
]}

机构：

[1] Univ Novi Sad, Fac Tech Sci, Trg Dositeja Obradovica 6, Novi Sad 21000, Serbia

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2022年 / 204卷

关键词：

Code smell detection; Neural source code embeddings; Code metrics; Machine learning; Software engineering; IMPACT;

D O I：

10.1016/j.eswa.2022.117607

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Code smells are structures in code that often harm its quality. Manually detecting code smells is challenging, so researchers proposed many automatic detectors. Traditional code smell detectors employ metric-based heuristics, but researchers have recently adopted a Machine-Learning (ML) based approach. This paper compares the performance of multiple ML-based code smell detection models against multiple metric-based heuristics for detection of God Class and Long Method code smells. We assess the effectiveness of different source code representations for ML: we evaluate the effectiveness of traditionally used code metrics against code embeddings (code2vec, code2seq, and CuBERT). This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. This approach helped us leverage the power of transfer learning - our study is the first to explore whether the knowledge mined from code understanding models can be transferred to code smell detection. A secondary contribution of our research is the systematic evaluation of the effectiveness of code smell detection approaches on the same large-scale, manually labeled MLCQ dataset. Almost every study that proposes a detection approach tests this approach on the dataset unique for the study. Consequently, we cannot directly compare the reported performances to derive the bestperforming approach.

引用

页数：18

共 50 条

[41] Method and its system of Java source and byte code plagiarism detection
Li, Hu
Liu, Chao
Liu, Nan
Li, Xiaoli
Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2010, 36 (04): : 424 - 428
[42] Source templates for the automatic generation of adjoint code through static call graph reversal
Naumann, U
Utke, J
COMPUTATIONAL SCIENCE - ICCS 2005, PT 1, PROCEEDINGS, 2005, 3514 : 338 - 346
[43] CLASC: A Changelog Based Automatic Code Source Classification Method for Operating System Packages
Ren, Yi
Guan, Jianbo
Ma, Jun
Tan, Yusong
Wu, Qingbo
Ding, Yan
2019 26TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC), 2019, : 378 - 385
[44] A TWO-STEP IN-CLASS SOURCE CODE PLAGIARISM DETECTION METHOD UTILIZING IMPROVED CM ALGORITHM AND SIM
Ohno, Asako
Murao, Hajime
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2011, 7 (08): : 4729 - 4739
[45] Cross-language Source Code Clone Detection Based On Graph Neural Network
Zhang, Yuguo
Yang, Jia
Ruan, Ou
PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CRYPTOGRAPHY, NETWORK SECURITY AND COMMUNICATION TECHNOLOGY, CNSCT 2024, 2024, : 189 - 194
[46] VDHGT: A Source Code Vulnerability Detection Method Based on Heterogeneous Graph Transformer
Yang, Hongyu
Yang, Haiyun
Zhang, Liang
CYBERSPACE SAFETY AND SECURITY, CSS 2022, 2022, 13547 : 217 - 224
[47] Interpretation of Learning-Based Automatic Source Code Vulnerability Detection Model Using LIME
Tang, Gaigai
Zhang, Long
Yang, Feng
Meng, Lianxiao
Cao, Weipeng
Qiu, Meikang
Ren, Shuangyin
Yang, Lin
Wang, Huiqiang
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, 2021, 12817 : 275 - 286
[48] A novel vulnerability severity assessment method for source code based on a graph neural network
Hao, Jingwei
Luo, Senlin
Pan, Limin
INFORMATION AND SOFTWARE TECHNOLOGY, 2023, 161
[49] Empirical evaluation of code smells in open-source software (OSS) using Best Worst Method (BWM) and TOPSIS approach
Tandon, Stuti
Kumar, Vijay
Singh, V. B.
INTERNATIONAL JOURNAL OF QUALITY & RELIABILITY MANAGEMENT, 2022, 39 (03) : 815 - 835
[50] An Efficient Programming Rule Extraction and Detection of Violations in Software Source Code using Neural networks
Pravin, A.
Srinivasan, S.
2012 FOURTH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2012,

← 1 2 3 4 5 →