Automatic detection of Long Method and God Class code smells through neural source code embeddings

被引:29
|
作者
Kovacevic, Aleksandar [1 ]
Slivka, Jelena [1 ]
Vidakovic, Dragan [1 ]
Grujic, Katarina-Glorija [1 ]
Luburic, Nikola [1 ]
Prokic, Simona [1 ]
Sladic, Goran [1 ]
机构
[1] Univ Novi Sad, Fac Tech Sci, Trg Dositeja Obradovica 6, Novi Sad 21000, Serbia
关键词
Code smell detection; Neural source code embeddings; Code metrics; Machine learning; Software engineering; IMPACT;
D O I
10.1016/j.eswa.2022.117607
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Code smells are structures in code that often harm its quality. Manually detecting code smells is challenging, so researchers proposed many automatic detectors. Traditional code smell detectors employ metric-based heuristics, but researchers have recently adopted a Machine-Learning (ML) based approach. This paper compares the performance of multiple ML-based code smell detection models against multiple metric-based heuristics for detection of God Class and Long Method code smells. We assess the effectiveness of different source code representations for ML: we evaluate the effectiveness of traditionally used code metrics against code embeddings (code2vec, code2seq, and CuBERT). This study is the first to evaluate the effectiveness of pre-trained neural source code embeddings for code smell detection to the best of our knowledge. This approach helped us leverage the power of transfer learning - our study is the first to explore whether the knowledge mined from code understanding models can be transferred to code smell detection. A secondary contribution of our research is the systematic evaluation of the effectiveness of code smell detection approaches on the same large-scale, manually labeled MLCQ dataset. Almost every study that proposes a detection approach tests this approach on the dataset unique for the study. Consequently, we cannot directly compare the reported performances to derive the bestperforming approach.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] A detection tool for code bad smells in java source code
    Gupta, Aakanshi
    Suri, Bharti
    Wadhwa, Bimlesh
    Advances in Intelligent Systems and Computing, 2021, 1086 : 479 - 488
  • [2] Automatic Detection of Architectural Bad Smells through Semantic Representation of Code
    Pigazzini, Ilaria
    13TH EUROPEAN CONFERENCE ON SOFTWARE ARCHITECTURE (ECSA 2019), VOL 2, 2019, : 59 - 62
  • [3] Reducing Subjectivity in Code Smells Detection: Experimenting with the Long Method
    Bryton, Sergio
    Brito e Abreu, Fernando
    Monteiro, Miguel
    QUATIC 2010: SEVENTH INTERNATIONAL CONFERENCE ON THE QUALITY OF INFORMATION AND COMMUNICATIONS TECHNOLOGY, 2010, : 337 - 342
  • [4] ACE: Anomalous code elimination through automatic detection within source code
    Stange, M
    PROCEEDINGS OF THE IEEE SOUTHEASTCON 2004: ENGINEERING CONNECTS, 2004, : 67 - 76
  • [5] Automatic detection of bad smells in code: An experimental assessment
    Fontana, Francesca Arcelli
    Braione, Pietro
    Zanoni, Marco
    JOURNAL OF OBJECT TECHNOLOGY, 2012, 11 (02):
  • [6] Automatic Human-Like Detection of Code Smells
    Soomlek, Chitsutha
    van Rijn, Jan N.
    Bonsangue, Marcello M.
    DISCOVERY SCIENCE (DS 2021), 2021, 12986 : 19 - 28
  • [7] Automatic detection of Feature Envy and Data Class code smells using machine learning
    Skipina, Milica
    Slivka, Jelena
    Luburic, Nikola
    Kovacevic, Aleksandar
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 243
  • [8] Automatic detection of code smells using metrics and CodeT5 embeddings: a case study in C#
    Aleksandar Kovačević
    Nikola Luburić
    Jelena Slivka
    Simona Prokić
    Katarina-Glorija Grujić
    Dragan Vidaković
    Goran Sladić
    Neural Computing and Applications, 2024, 36 : 9203 - 9220
  • [9] Automatic detection of code smells using metrics and CodeT5 embeddings: a case study in C#
    Kovacevic, Aleksandar
    Luburic, Nikola
    Slivka, Jelena
    Prokic, Simona
    Grujic, Katarina-Glorija
    Vidakovic, Dragan
    Sladic, Goran
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (16): : 9203 - 9220
  • [10] On the Embeddings of Variables in Recurrent Neural Networks for Source Code
    Chirkova, Nadezhda
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2679 - 2689