Comparing and experimenting machine learning techniques for code smell detection

被引:257
|
作者
Fontana, Francesca Arcelli [1 ]
Mantyla, Mika V. [4 ,5 ]
Zanoni, Marco [2 ]
Marino, Alessandro [3 ]
机构
[1] Univ Milano Bicocca, Dept Comp Sci, Milan, Italy
[2] Univ Milano Bicocca, Dept Informat Syst & Commun, Milan, Italy
[3] Univ Milano Bicocca, Milan, Italy
[4] Univ Oulu, Software Engn, Oulu, Finland
[5] Aalto Univ, Helsinki, Finland
关键词
Code smells detection; Machine learning techniques; Benchmark for code smell detection; BAD SMELLS; QUALITY; CLASSIFICATION;
D O I
10.1007/s10664-015-9378-4
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Several code smell detection tools have been developed providing different results, because smells can be subjectively interpreted, and hence detected, in different ways. In this paper, we perform the largest experiment of applying machine learning algorithms to code smells to the best of our knowledge. We experiment 16 different machine-learning algorithms on four code smells (Data Class, Large Class, Feature Envy, Long Method) and 74 software systems, with 1986 manually validated code smell samples. We found that all algorithms achieved high performances in the cross-validation data set, yet the highest performances were obtained by J48 and Random Forest, while the worst performance were achieved by support vector machines. However, the lower prevalence of code smells, i.e., imbalanced data, in the entire data set caused varying performances that need to be addressed in the future studies. We conclude that the application of machine learning to the detection of these code smells can provide high accuracy (>96 %), and only a hundred training examples are needed to reach at least 95 % accuracy.
引用
收藏
页码:1143 / 1191
页数:49
相关论文
共 50 条
  • [1] Comparing and experimenting machine learning techniques for code smell detection
    Francesca Arcelli Fontana
    Mika V. Mäntylä
    Marco Zanoni
    Alessandro Marino
    Empirical Software Engineering, 2016, 21 : 1143 - 1191
  • [2] Comparing Heuristic and Machine Learning Approaches for Metric-Based Code Smell Detection
    Pecorelli, Fabiano
    Palomba, Fabio
    Di Nucci, Dario
    De Lucia, Andrea
    2019 IEEE/ACM 27TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2019), 2019, : 93 - 104
  • [3] Comparing Within- and Cross-Project Machine Learning Algorithms for Code Smell Detection
    De Stefano, Manuel
    Pecorelli, Fabiano
    Palomba, Fabio
    De Lucia, Andrea
    MALTESQUE '21: PROCEEDINGS OF THE 5TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING TECHNIQUES FOR SOFTWARE QUALITY EVOLUTION, 2021, : 1 - 6
  • [4] Code smell severity classification using machine learning techniques
    Fontana, Francesca Arcelli
    Zanoni, Marco
    KNOWLEDGE-BASED SYSTEMS, 2017, 128 : 43 - 58
  • [5] Boosting and Comparing Performance of Machine Learning Classifiers with Meta-heuristic Techniques to Detect Code Smell
    Jain, Shivani
    Saha, Anju
    E-INFORMATICA SOFTWARE ENGINEERING JOURNAL, 2024, 18 (01)
  • [6] Improving performance with hybrid feature selection and ensemble machine learning techniques for code smell detection
    Jain, Shivani
    Saha, Anju
    SCIENCE OF COMPUTER PROGRAMMING, 2021, 212
  • [7] Machine learning techniques for code smell detection: A systematic literature review and meta-analysis
    Azeem, Muhammad Ilyas
    Palomba, Fabio
    Shi, Lin
    Wang, Qing
    INFORMATION AND SOFTWARE TECHNOLOGY, 2019, 108 : 115 - 138
  • [8] Improving and comparing performance of machine learning classifiers optimized by swarm intelligent algorithms for code smell detection
    Jain, Shivani
    Saha, Anju
    SCIENCE OF COMPUTER PROGRAMMING, 2024, 237
  • [9] Code Smell Detection Using Ensemble Machine Learning Algorithms
    Dewangan, Seema
    Rao, Rajwant Singh
    Mishra, Alok
    Gupta, Manjari
    APPLIED SCIENCES-BASEL, 2022, 12 (20):
  • [10] Revisiting "code smell severity classification using machine learning techniques"
    Hu, Wenhua
    Liu, Lei
    Yang, Peixin
    Zou, Kuan
    Li, Jiajun
    Lin, Guancheng
    Xiang, Jianwen
    2023 IEEE 47TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE, COMPSAC, 2023, : 840 - 849