Comparing and experimenting machine learning techniques for code smell detection

被引:257
|
作者
Fontana, Francesca Arcelli [1 ]
Mantyla, Mika V. [4 ,5 ]
Zanoni, Marco [2 ]
Marino, Alessandro [3 ]
机构
[1] Univ Milano Bicocca, Dept Comp Sci, Milan, Italy
[2] Univ Milano Bicocca, Dept Informat Syst & Commun, Milan, Italy
[3] Univ Milano Bicocca, Milan, Italy
[4] Univ Oulu, Software Engn, Oulu, Finland
[5] Aalto Univ, Helsinki, Finland
关键词
Code smells detection; Machine learning techniques; Benchmark for code smell detection; BAD SMELLS; QUALITY; CLASSIFICATION;
D O I
10.1007/s10664-015-9378-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Several code smell detection tools have been developed providing different results, because smells can be subjectively interpreted, and hence detected, in different ways. In this paper, we perform the largest experiment of applying machine learning algorithms to code smells to the best of our knowledge. We experiment 16 different machine-learning algorithms on four code smells (Data Class, Large Class, Feature Envy, Long Method) and 74 software systems, with 1986 manually validated code smell samples. We found that all algorithms achieved high performances in the cross-validation data set, yet the highest performances were obtained by J48 and Random Forest, while the worst performance were achieved by support vector machines. However, the lower prevalence of code smells, i.e., imbalanced data, in the entire data set caused varying performances that need to be addressed in the future studies. We conclude that the application of machine learning to the detection of these code smells can provide high accuracy (>96 %), and only a hundred training examples are needed to reach at least 95 % accuracy.
引用
收藏
页码:1143 / 1191
页数:49
相关论文
共 50 条
  • [31] Machine Learning Techniques For Python']Python Source Code Vulnerability Detection
    Farasat, Talaya
    Posegga, Joachim
    PROCEEDINGS OF THE FOURTEENTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, CODASPY 2024, 2024, : 151 - 153
  • [32] Rank-based univariate feature selection methods on machine learning classifiers for code smell detection
    Shivani Jain
    Anju Saha
    Evolutionary Intelligence, 2022, 15 : 609 - 638
  • [33] A large empirical assessment of the role of data balancing in machine-learning-based code smell detection
    Pecorelli, Fabiano
    Di Nucci, Dario
    De Roover, Coen
    De Lucia, Andrea
    JOURNAL OF SYSTEMS AND SOFTWARE, 2020, 169
  • [34] Rank-based univariate feature selection methods on machine learning classifiers for code smell detection
    Jain, Shivani
    Saha, Anju
    EVOLUTIONARY INTELLIGENCE, 2022, 15 (01) : 609 - 638
  • [35] Improving Machine Learning-based Code Smell Detection via Hyper-parameter Optimization
    Shen, Lei
    Liu, Wangshu
    Chen, Xiang
    Gu, Qing
    Liu, Xuejun
    2020 27TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2020), 2020, : 276 - 285
  • [36] Code smell detection based on supervised learning models: A survey
    Zhang, Yang
    Ge, Chuyan
    Liu, Haiyang
    Zheng, Kun
    NEUROCOMPUTING, 2024, 565
  • [37] Revisiting Code Smell Severity Prioritization using learning to rank techniques
    Liu, Lei
    Lin, Guancheng
    Zhu, Lin
    Yang, Zhen
    Song, Peilin
    Wang, Xin
    Hu, Wenhua
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [38] Application of Deep Learning for Code Smell Detection: Challenges and Opportunities
    Hadj-Kacem M.
    Bouassida N.
    SN Computer Science, 5 (5)
  • [39] Machine learning-based test smell detection
    Valeria Pontillo
    Dario Amoroso d’Aragona
    Fabiano Pecorelli
    Dario Di Nucci
    Filomena Ferrucci
    Fabio Palomba
    Empirical Software Engineering, 2024, 29
  • [40] Machine learning-based test smell detection
    Pontillo, Valeria
    d'Aragona, Dario Amoroso
    Pecorelli, Fabiano
    Di Nucci, Dario
    Ferrucci, Filomena
    Palomba, Fabio
    EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (02)